Syllabus
GEO5165C: Quantitative Geography
Contact information
- Instructor Name: Professor James B. Elsner (he/him/his)
- Instructor Location: Bellamy Building, Room 323a
- Lesson Hours: TR 3:05-4:20 p.m.
- Student Hours: TR 9-10 a.m., 2-3 p.m.
Email: jelsner@fsu.edu
Course description
This course is an introduction to the quantitative analysis of geographic data (data analysis for geographers). Most of the course content will be available through Canvas and through RStudio Cloud. Please open an account with RStudio Cloud at (https://rstudio.cloud).
Please use this link https://rstudio.cloud/spaces/12733/join?access_code=NuhGFcK71GlGuzoKzAUIe1lqgcMDyOIC7UnnFtNG to join the Spatial Data Analysis workspace on RStudio Cloud.
Expected learning outcomes
You will describe and demonstrate the principles of data science. You will do this with a grammar for manipulating data and a grammar for making graphs. The grammars are implemented in R using the syntax of tidyverse.
Materials
- You will need access to the internet and either an iPad, laptop, or desktop computer.
- All course materials are available through RStudio Cloud (https://rstudio.cloud/spaces/12733/projects) and archived on Canvas.
- There is no required textbook.
- Much of the material for the course comes from the online book: R for Data Science https://r4ds.had.co.nz/
- Additional help is available online (e.g., https://tinystats.github.io/teacups-giraffes-and-statistics/index.html)
Class meetings
- Online: synchronous, interactive, asynchronous recordings available on Canvas
- Some lectures, lots of learn-by-doing
Grades
- Grades are determined solely by how well you do on the regularly scheduled homework/classwork assignments.
- There are NO quizzes, tests, or exams.
- Synchronous attendance is expected but not required.
- Late classwork or homework assignments will not be accepted.
- Cumulative numerical averages of 90 - 100 (outstanding) are guaranteed at least an A-, 80 - 89 (good) at least a B-, and 70 - 79 (satisfactory) at least a C-, however the exact ranges for letter grades will be determined after all work is complete.
Academic honor code
Students With Disabilities Act
Students needing academic accommodation should: (1) register with and provide documentation to the Student Disability Resource Center (https://dos.fsu.edu/sdrc/); (2) bring a letter to me indicating the need for accommodation and what type. This should be done sometime during the first week of classes.
Inclusiveness
It is my intent to present materials and activities that are respectful of diversity: gender identity, sexuality, disability, age, socioeconomic status, ethnicity, race, nationality, religion, and culture. Let me know ways to improve the effectiveness of the course for you personally, or for other students or student groups.
- If you have a name and/or set of pronouns that differ from those that appear in your official FSU records, please let me know.
- If you feel your performance is being impacted by your experiences outside of class, please don’t hesitate to come and talk with me. If you prefer to speak with someone outside of the course, your academic dean is an excellent resource.
- If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it.
More about your instructor
Syllabus change policy
This syllabus is a guide for the course and is subject to change with advanced notice.
Schedule (subject to change with notice)
| Week | Dates | Topic |
|---|---|---|
| 1 | August 24, 26, 28 | RStudio Cloud and R |
| 2 | August 31, September 2, 4 | Working with R |
| 3 | September 9, 11 | Data and data frames |
| 4 | September 14, 16, 18 | Data analysis |
| 5 | September 21, 23, 25 | Graphical analysis |
| 6 | September 28, 30, October 1 | Mapping |
| 7 | October 5, 7, 9 | Bayesian data analysis |
| 8 | October 12, 14, 16 | Regression |
| 9 | October 19, 21, 23 | Multiple regression |
| 10 | October 26, 28, 30 | Regression trees |
| 11 | November 2, 4, 6 | Spatial data |
| 12 | November 9, 13 | Spatial autocorrelation |
| 13 | November 16, 18, 19 | Spatial autocorrelation |
| 14 | November 30, December 2, 3 | Geographic regression |
I will cover new material on Mondays and Wednesdays. On Fridays you will work on your assignment. Assignments are due Fridays at 5p.
| Assignment | Due Date (no later than 5 pm) |
|---|---|
| 1 | August 28 |
| 2 | September 4 |
| 3 | September 11 |
| 4 | September 25 |
| 5 | October 1 |
| 6 | October 9 |
| 7 | October 16 |
| 8 | October 23 |
| 9 | October 30 |
| 10 | November 19 |
| 11 | December 3 |
Other materials to check out
Best practices for working with R Develop a project-oriented work flow and don’t use
setwd(). Don’t hard-coded file path names. Use version control (github). Manage package dependencies. Pick a style of writing code and stick with it. Do as much as you can in RMarkdown (notes, lectures, slides, etc). Check out Openscapes.Towards a more open and reproducible approach to geographic and spatial data science Opening practice: supporting reproducibility and critical spatial data science
Julia programming language
Julia programming language https://julialang.org/ Download > Open
Jupyter Notebook (1) Anaconda > Individual > Download > Install (2) In the Julia REPL type: using Pkg Pkg.add(“IJulia”) (3) Then click on the Anaconda-Navigator icon and Launch Jupyter Notebook (4) Click on the New button and select Julia. Problems? watch https://www.youtube.com/watch?v=oyx8M1yoboY
Pluto Notebook In the Julia REPL type: import Pkg; Pkg.add(“Pluto”) import Pluto Pluto.run()
md""" # This Pluto notebook is a test. """ begin a = [1, 4, 7, 22] a * 10 end
Tuesday, August 23, 2022
- Is it getting hotter here in Tallahassee?
- Are Atlantic hurricanes getting stronger?
Data science (formerly known as ‘statistics’) is an exciting discipline that allows you to turn data into understanding, insight, and knowledge.
Today
- Understand what this course is about, how it is structured, and my expectations for you
- Start working with RStudio and R.
What is this course?
This is designed as first course in data science for geographers.
Q - What statistics background does this course assume?
A - None.
Q - Is this an intro stat course?
A - Statistics and data science are closely related with much overlap. Hence, this course is a great way to get started with statistics. But this course is not your typical high school/college statistics course.
Q - Will you be doing computing?
A - Yes.
Q - Is this an introduction to computer science course?
A - No, but many themes are shared.
Q - What computing language will you learn?
A - R.
Q - Why not language some other language?
A - We can discuss that over coffee.
Join RStudio Cloud
- Go to RStudio Cloud, and log in.
- Click on this link https://rstudio.cloud/spaces/75344/join?access_code=drxxMUzCFzMd2hUzGi5EZZaR9CgXY2jJVC6mz54L to join the Quantitative Geography Using R Space on RStudio Cloud.
Examples
Some of my recent research:
Other research:
- A year as told by fitbit by Nick Strayer
- R-Ladies global tour by Maelle Salmon
Course Syllabus
- Log on to RStudio Cloud and click on this course’s Space (Quantitative Geography Using R).
- Click on the project
00_Syllabusand launch it. - Open the
00-Syllabus.Rmdfile (lower-right panel), and then click on the “Knit” button. - Review
Tallahassee daily temperatures
- Packages > Install
- In the Packages window, type tidyverse, lubridate, here, ggplot2 then select Install
Get the data into your environment.
TLH.df <- readr::read_csv(file = here::here('data', 'TLH_SOD1892.csv'),
show_col_types = FALSE) |>
dplyr::filter(STATION == 'USW00093805') |>
dplyr::mutate(Date = as.Date(DATE)) |>
dplyr::mutate(Year = lubridate::year(Date),
Month = lubridate::month(Date),
Day = lubridate::day(Date),
doy = lubridate::yday(Date)) |>
dplyr::select(Date, Year, Month, Day, doy, TMAX, TMIN, PRCP)package::function (:: is called a library specifier).
Or, load the packages into your current environment with the library() function in the file above where they are first used.
Create a plot of the frequency of high temperatures.
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
TLH.df |>
group_by(TMAX) |>
summarize(nH = n()) |>
ggplot(mapping = aes(x = TMAX, y = nH)) +
geom_col(col = 'white', fill = "gray70") +
labs(title = "Frequency of Daily High Temperatures",
subtitle = "Tallahassee, FL, USA (1940-2018)",
x = "Daily High Temperature (°F)",
y = "Number of Days") +
scale_x_continuous(breaks = seq(from = 20, to = 110, by = 10)) +
theme_minimal()## Warning: Removed 1 rows containing missing values (position_stack).

Thursday, August 25, 2022
Today
Data science: reproducibility, communication, and automation
Structure of markdown files
How to make a simple plot
Everything you create is an object
Turn off your camera.
Any questions about my grading of your assignments?
Make sure (1) you are watching (or at least listening) to me via Zoom, and (2) you have a copy of
02_Lessonproject and have the02-Lesson.Rmdfile open.Follow along in your copy of the lesson as I go line by line through the file on Zoom.
Your files background and text might look different. Is it? If so, got to Tools > Global Options > Appearance > Cobalt
Much of the lesson materials come from online books: https://www.bigbookofr.com/index.html
Datasets: https://kieranhealy.org/blog/archives/2020/08/25/some-data-packages/
Data Analysis
Data analytics are done on a computer. You have two choices: use a spreadsheet or write code.
A spreadsheet is convenient, but they make the three conditions for a good data analysis reproducibility, communication, and automation difficult to achieve.
Reproducibility
A scientific paper is advertisement for a claim. But the proof is the procedure that was used to obtain the result.

If your analysis is to be convincing, the trail from the data you started with to the final output must be available to the public. A reproducible trail with a spreadsheet is hard. It is easy to make mistakes (e.g., accidentally sorting just a column rather than the entire table).
A set of instructions written as computer code is the exact procedure. (Open stronger-hur.Rmd).
Communication
Code is a recipe for what you did. It communicates precisely what was done. Communication to others and to your future self.
It’s hard to explain to someone precisely what you did when working with a spreadsheet. Click here, then right click here, then choose menu X, etc. The words needed to describe these procedures are not standard. Code is an efficient way to communicate because all important information is given as plain text with no ambiguity.
Automation
If you’ve ever made a map using a geographic information system (GIS) you know how hard it is to make another one with a new set of data (even a very similar one). Running code with new data is simple.
Being able to code is an important skill for nearly all technical jobs. Here you will learn how to code. But keep in mind: Just like learning to write doesn’t mean you will be a writer (i.e., make a living writing), learning to code doesn’t mean you will be a coder.
The R programming language
- R is a leading open source programming language for data science. R and Python.
- Free, open-source, runs on Windows, Macs, etc. Excellent graphing capabilities. Powerful, extensible, and relatively easy to learn syntax. Thousands of functions.
- Has all the cutting edge statistical methods including methods in spatial statistics.
- Used by scientists of all stripes. Most of the world’s statisticians use it (and contribute to it).
Overview of this course
We start with making graphs. You will make clear, informative plots that will help you understand your data. You will learn the basic structure of a making a plot.
Visualization alone is not enough, so you will also learn the key verbs that allow you to select important variables, filter out key observations, create new variables, and compute summaries (data wrangling).
You will then combine data wrangling and visualization with your curiosity to ask and answer interesting questions by learning how to fit models to your data. Data models extend your ability to ask and answer questions about the world you live in.
With geographic and environmental data collected at different locations these models will include a spatial component.
Work in plain text, using R Markdown
The ability to reproduce your work is important to a scientific process. It is also pragmatic. The person most likely to reproduce your work a few months later is you.
This is especially true for graphs and figures. These often have a finished quality to them as a result of tweaking and adjustments to the details. This makes it hard to reproduce them later.
The goal is to do as much of this tweaking as possible with the code you write, rather than in a way that is invisible (retrospectively). Contrast editing an image in Adobe Illustrator.
You will find yourself constantly going back and forth between three things:
Writing code: You will write code to produce plots. You will also write code to load your data (get your data into R), to look quickly at tables of that data. Sometimes you will want to summarize, rearrange, subset, or augment your data, or fit a statistical model to it. You will want to be able to write that code as easily and effectively as possible.
Looking at output. Your code is a set of instructions that produces the output you want: a table, a model, or a figure. It is helpful to be able to see that output.
Taking notes. You will also write about what you are doing, and what your results mean.
To do these things efficiently you want to write your code together with comments. This is where markdown comes in (files that end with .Rmd)
An R markdown file is a plain text document where text (such as notes or discussion) is interspersed with pieces, or chunks, of R code. When you Knit this file the R code is executed piece by piece, in sequence starting at the top of the file, and either supplementing or replacing the chunks of code with output.
The resulting file is then converted into a more easily-readable document formatted in HTML, PDF, or Word. The non-code segments of the document are plain text with simple formatting instructions (e.g., ## for section header).
There is a set of conventions for marking up plain text in a way that indicates how it should be formatted. Markdown treats text surrounded by asterisks, double asterisks, and backticks in special ways. It is R Markdown’s way of saying that these words are in
- italics
- also italics
- bold, and
code font
Your class notes include code. There is a set format for including code into your markdown file (lines of code; code chunk). They look like this:
library(ggplot2)I call these markings code chunk delimiters.
Three back ticks (on a U.S. keyboard, the character under the escape key) followed by a pair of curly braces containing the name of the language you are using. The format is language-agnostic and can be used with, e.g. Python and other languages.
The back ticks-and-braces signals that what follows is code. You write your code as needed, and then end the chunk with a new line containing three more back ticks.
If you keep your notes in this way, you will be able to see the code you wrote, the output it produces, and your own commentary or clarification on what the code did, all in a convenient way. Moreover, you can turn it into a good-looking document straight away with the Knit button.
This is how you will do everything in this course. In the end you will have a set of notes that you can turn into a book with bookdown.
Visualizing data
To help motivate your interest in this course, we start by making a graph. There are three things to learn:
- How to create graphs with a reusable {ggplot2} template
- How to add variables to a graph with aesthetics
- How to select the ‘type’ of your graph with geoms
The following examples are taken from R for Data Science by Hadley Wickham and Garrett Grolemund, published by O’Reilly Media, Inc., 2016, ISBN: 9781491910399. https://r4ds.had.co.nz/.
A code template
Let’s begin with a question to explore.
What do you think: Do cars with big engines use more fuel than cars with small engines?
- A: Cars with bigger engines use more fuel.
- B: Cars with bigger engines use less fuel.
You check your answer with two things: the mpg data that comes in {ggplot2} and a plot. The mpg object contains observations collected on 38 models of cars by the US Environmental Protection Agency. Among the variables in mpg are:
displ, a car’s engine size, in liters.hwy, a car’s fuel efficiency on the highway, in miles per gallon (mpg).
A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance.
To see a portion of the mpg data, type mpg after you loaded the package using the library() function.
library(ggplot2)
mpg## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # … with 224 more rows
You see the first 10 rows and 10 columns of the data. Note that there are 234 rows and 11 columns so you are only viewing a portion of this spreadsheet.
Each row is a different car. The first row is the Audi A4 1999 model with automatic transmission (5 gears). The tenth car listed is the Audi A4 Quattro with manual transmission (6 gears).
The column labeled displ is the engine size in liters. Bigger number means the car has a larger engine. The column labeled hwy is the miles per gallon. Bigger number means the car uses more fuel to go the same distance (lower efficiency).
It is hard to check which answer is correct by looking only at these 10 cars. Note that bigger engines appear to have smaller values of highway mileage but it is far from clear.
You want to look at all 234 cars.
The code below uses functions from the {ggplot2} package to plot the relationship between displ and hwy for all cars.
Let’s look at the plot and then talk about the code itself. To see the plot, click on the little green triangle in the upper right corner of the gray shaded region.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
The plot shows an inverse relationship between engine size (displ) and fuel efficiency (hwy). Each point is a different car. Cars that have a large value of displ tend to have a small value of hwy and cars with a small value of displ tend to have a large value of hwy.
In other words, cars with big engines use more fuel. If that was your hypothesis, you were right!
Now let’s look at how you made the plot.
The code
Here’s the code used to make the plot. Notice that it contains three functions: ggplot(), geom_point(), and aes().
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))The first function, ggplot(), creates a coordinate system that you can add layers to. The first argument of ggplot() is the dataset to use in the graph.
By itself, ggplot(data = mpg) creates an empty graph, but it is not very interesting so I’m not going to show it here.
The function geom_point() adds a layer of points to the empty plot created by ggplot(). As a result, you get a scatterplot.
The function geom_point() takes a mapping argument, which defines which variables in your dataset are mapped to which axes in your graph. The mapping argument is always paired with the function aes(), which you use to bring together the mappings you want to create.
Here, you want to map the displ variable to the x axis (horizontal axis) and the hwy variable to the y axis (vertical axis), so you add x = displ and y = hwy inside of aes() (and you separate them with a comma). Where will ggplot() look for these mapped variables? In the data frame that you passed to the data argument, in this case, mpg.
- Knit to generate HTML.
- Compare the HTML with the Rmd.
A graphing workflow
The code above follows the common work flow for making graphs. To make a graph, you:
- Start the graph with
ggplot() - Add elements to the graph with a
geom_function - Select variables with the
mapping = aes()argument
A graphing template
In fact, you can turn your code into a reusable template for making graphs. To make a graph, replace the bracketed sections in the code below with a data set, a geom_ function, or a collection of mappings.
Give it a try!
- Copy and paste the above code chunk, including the code chunk delimiters, and replace the
y = hwywithy = cty.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = cty))
- Replace the bracketed sections
< >withmpg,geom_boxplot, andx = class,y = hwyto make a slightly different graph.
ggplot(data = mpg) +
geom_boxplot(mapping = aes(x = class, y = hwy))Common problems
As you start to work with R code, you are likely to run into problems. Don’t worry — it happens to everyone. I’ve been writing R code for decades, and I still write code that doesn’t work!
Start by comparing the code that you are running to the code in the examples in these notes. R is extremely picky, and a misplaced character can make all the difference. Make sure that every ( is matched with a ) and every " is paired with another ". Also pay attention to capitalization; R is case sensitive.
location of the + sign
One common problem when creating {ggplot2} graphics is to put the + in the wrong place: it must come at the end of a line, not the start. In other words, make sure you haven’t accidentally written code like this:
ggplot(data = mpg)
+ geom_point(mapping = aes(x = displ, y = hwy))help
If you’re still stuck, try the help. You can get help about any R function by running ?function_name in a code chunk, e.g. ?geom_point. Don’t worry if the help doesn’t seem that helpful — instead skip down to the bottom of the help page and look for a code example that matches what you’re trying to do.
If that doesn’t help, carefully read the error message that appears when you run your (non-working) code. Sometimes the answer will be buried there! But when you’re new to R, you might not yet know how to understand the error message. Another great tool is Google: try googling the error message, as it’s likely someone else has had the same problem, and has gotten help online.
Things to know
You are getting oriented to the language itself (what happens at the console), while learning to take notes in what might seem like an odd format (chunks of code interspersed with plain-text comments), in an IDE (integrated development environment) that that has many features designed to make your life easier in the long run, but which can be hard to decipher at the beginning. Here are some general points to keep in mind about how R is designed. They might help you get a feel for how the language works.
Everything has a name
In R, everything you deal with has a name. You refer to things by their names as you examine, use, or modify them. Named entities include variables (like x, or y), data that you have loaded (like my_data), and functions that you use. (More about functions soon.) You will spend a lot of time talking about, creating, referring to, and modifying things with names.
Things are listed under the Environment tab in the upper right panel.
Some names are forbidden. These include reserved words like FALSE and TRUE, core programming words like Inf, for, else, break, function, and words for special entities like NA and NaN. (These last two are codes designating missing data and “Not a Number,” respectively.) You probably won’t use these names by accident, but it’s good do know that they are not allowed.
Some names you should not use, even if they are technically permitted. These are mostly words that are already in use for objects or functions that form part of the core of R. These include the names of basic functions like q() or c(), common statistical functions like mean(), range() or var(), and built-in mathematical constants like pi.
Names in R are case sensitive. The object my_data is not the same as the object My_Data. When choosing names for things, be concise, consistent, and informative. Follow the style of the tidyverse and name things in lower case, separating words with the underscore character, _, as needed. Do not use spaces when naming things, including variables in your data.
Everything is an object
Some objects are part of R, some are added via packages, and some are created by you. But almost everything is some kind of object. The code you write will create, manipulate, and use named objects.
Let’s create a vector of numbers. The command c() is a function. It’s short for “combine” or “concatenate.” It will take a sequence of comma-separated things inside the parentheses and join them into a vector where each element is still accessible.
c(1, 2, 3, 1, 3, 5, 25)## [1] 1 2 3 1 3 5 25
Instead of sending the result to the console, here you instead assign the result to an object.
my_numbers <- c(1, 2, 3, 1, 3, 5, 25)
your_numbers <- c(5, 31, 71, 1, 3, 21, 6)To see what you created, type the name of the object and hit return.
my_numbers## [1] 1 2 3 1 3 5 25
Each of our numbers is still there, and can be accessed directly if you want. They are now just part of a new object, a vector, called my_numbers.
You create objects by assigning them to names. The assignment operator is <-. Think of assignment as the verb “gets,” reading left to right. So, the bit of code above is read as “The object my_numbers gets the result of concatenating the following numbers: 1, 2, …”
The operator is two separate keys on your keyboard: the < key and the - (minus) key. When you create objects by assigning things to names, they come into existence in R’s workspace or environment.
You do things using functions
You do almost everything in R using functions. Think of a function as a special kind of object that can perform actions for you. It produces output based on the input that it receives. Like a good dog, when you want a function to do something, you call it. Somewhat less like a dog, it will reliably do what you tell it.
You give the function some information, it acts on that information, and some results come out the other side. Functions can be recognized by the parentheses at the end of their names. This distinguishes them from other objects, such as single numbers, named vectors, tables of data, and so on.
You send information to the function between the parentheses. Most functions accept at least one argument. A function’s arguments are the things it needs to know in order to do something. They can be some bit of your data (data = my_numbers), or specific instructions (title = "GDP per Capita"), or an option you want to choose (smoothing = "splines", show = FALSE).
For example, the object my_numbers is a numeric vector:
my_numbers## [1] 1 2 3 1 3 5 25
But the thing you used to create it, c(), is a function. It combines the items into a vector composed of the series of comma-separated elements you give it. Similarly, mean() is a function that calculates a simple average for a vector of numbers. What happens if you just type mean() without any arguments inside the parentheses?
mean()The error message is terse but informative. The function needs an argument to work, and you haven’t given it one. In this case, ‘x,’ the name of another object that mean() can perform its calculation on:
mean(x = my_numbers)## [1] 5.714286
Or
mean(x = your_numbers)## [1] 19.71429
While the function arguments have names that are used internally, (here, x =), you don’t strictly need to specify the name for the function to work:
mean(my_numbers)## [1] 5.714286
If you omit the name of the argument, R will just assume you are giving the function what it needs, and in some order. The documentation for a function will tell you what the order of required arguments is for any particular function.
For simple functions that only require one or two arguments, omitting their names is usually not confusing. For more complex functions, you will typically want to use the names of the arguments rather than try to remember what the ordering is.
In general, when providing arguments to a function the syntax is <argument> = <value>. If <value> is a named object that already exists in your workspace, like a vector of numbers of a table of data, then you provide it unquoted, as in mean(my_numbers). If <value> is not an object, a number, or a logical value like TRUE, then you usually put it in quotes, e.g., labels(x = "X Axis Label").
Functions take inputs via their arguments, do something, and return outputs. What the output is depends on what the function does. The c() function takes a sequence of comma-separated elements and returns a vector consisting of those same elements. The mean() function takes a vector of numbers and returns a single number, their average.
Functions can return far more than single numbers. The output returned by functions can be a table of data, or a complex object such as the results of a linear model, or the instructions needed to draw a plot. They can even be other functions. For example, the summary() function performs a series of calculations on a vector and produces what is in effect a little table with named elements.
A function’s argument names are internal to that function. Say you have created an object in your environment named x, for example. A function like mean() also has a named argument, x, but R will not get confused by this. It will not use your x object by mistake.
As you have already seen with c() and mean(), you can assign the result of a function to an object:
my_summary <- summary(my_numbers)When you do this, there’s no output to the console. R just puts the results into the new object, as you instructed. To look inside the object you can type its name and hit return:
my_summary## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.500 3.000 5.714 4.000 25.000
Functions come in packages (libraries)
The code you write will be more or less complex depending on the task you want to accomplish. Families of useful functions are bundled into packages that you can install, load into your R session, and make use of as you work.
Packages save you from reinventing the wheel. They make it so that you do not, for example, have to figure out how to write code from scratch to draw a shape on screen, or load a data file into memory.
Packages are also what allow you to build on the efforts of others in order to do your own work. {ggplot2} is a package of functions.
There are many other such packages and you will make use of several throughout this course, either by loading them with the library() function, or “reaching in” to them and pulling a useful function from them directly.
All of the work you will do this semester will involve choosing the right function or functions, and then giving those functions the right instructions through a series of named arguments.
Most of the mistakes you will make, and the errors you will fix, will involve having not picked the right function, or having not fed the function the right arguments, or having failed to provide information in a form the function can understand.
For now, just remember that you do things in R by creating and manipulating named objects. You manipulate objects by feeding information about them to functions. The functions do something useful with that information (calculate a mean, re-code a variable, fit a model) and give you the results back.
Try these out.
table(my_numbers)## my_numbers
## 1 2 3 5 25
## 2 1 2 1 1
sd(my_numbers)## [1] 8.616153
my_numbers * 5## [1] 5 10 15 5 15 25 125
my_numbers + 1## [1] 2 3 4 2 4 6 26
my_numbers + my_numbers## [1] 2 4 6 2 6 10 50
The first two functions here gave us a simple table of counts and calculated the standard deviation of my_numbers.
It’s worth noticing what R did in the last three cases. First you multiplied my_numbers by two. R interprets that as you asking it to take each element of my_numbers one at a time and multiply it by five. It does the same with the instruction my_numbers + 1. The single value is “recycled” down the length of the vector.
By contrast, in the last case we add my_numbers to itself. Because the two objects being added are the same length, R adds each element in the first vector to the corresponding element in the second vector.
Your turn
Create a code chunk to compute the coefficient of variation (standard deviation divided by the mean) for your numbers (my_numbers).
Tuesday, August 30, 2022
Today
- More graphing examples
- How R works
If your analysis is to be a convincing, the trail from data to final output must be open and available to all. Markdown helps you create scientific reports that are a mixture of text and code. This makes it easy to create an understandable trail from hypothesis, to data, to analysis, to results. Reproducible research.
Scatter plots
Functions from the {ggplot2} package are used to make graphs. You make these graphing functions available for a given session of R (every time you open RStudio) with the library(ggplot2) function.
As an example, consider the data frame called airquality. The data contains daily air quality measurements from a location in New York City between May and September of 1973.
Follow along by pressing the green arrows when you get to a code chunk.
head(airquality)## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
dim(airquality)## [1] 153 6
The data contains 153 rows and 6 columns. Each row is a set of measurements across six variables on a given day.
Most data you will work with are like this. Each row is a set of measurements (a case) and each column is a variable.
The columns (variables) include the measurements of ozone concentration (Ozone) (ppm), solar radiation (Solar.R) (langley), wind speed (Wind) (mph), temperature (Temp) (F), as well as Month and Day.
Question: Are ozone concentrations higher on warmer days? Let’s see what the data say.
The scatter plot is one of the most useful statistical graphs. It describes the relationship between two variables. It is made by plotting the variables in a plane defined by the values of the variables.
Using the {ggplot2} functions, you answer the question above by mapping the Temp variable to the x aesthetic and the Ozone variable to the y aesthetic.
More simply you could say that you plot Temp on the x axis and Ozone on the y axis. Put you want to recognize that the axes are aesthetics (there are other aesthetics like color, size, etc).
library(ggplot2)
ggplot(data = airquality) +
geom_point(mapping = aes(x = Temp, y = Ozone))## Warning: Removed 37 rows containing missing values (geom_point).

What do you see? Why the warning?
To suppress the warning, you add the argument na.rm = TRUE in the geom_point() function.
ggplot(data = airquality) +
geom_point(mapping = aes(x = Temp, y = Ozone),
na.rm = TRUE)
To help us better describe the relationship you add another layer. This layer is defined by geom_smooth() which takes the same aesthetics.
ggplot(data = airquality) +
geom_point(mapping = aes(x = Temp, y = Ozone), na.rm = TRUE) +
geom_smooth(mapping = aes(x = Temp, y = Ozone), na.rm = TRUE)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The smooth line describes how the average ozone concentration varies with temperature. For lower temperatures there is not much change in ozone concentrations as temperatures increase, but for higher temperatures the increase in ozone concentrations is more pronounced.
In the above code you used the same mapping for the point layer and the smooth layer. You can simplify the code by putting the mapping = argument into the ggplot() function.
ggplot(data = airquality,
mapping = aes(x = Temp, y = Ozone)) +
geom_point(na.rm = TRUE) +
geom_smooth(na.rm = TRUE)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Question: On average is ozone concentration higher on windy days? Create a graph to help you answer this question.
ggplot(data = airquality,
mapping = aes(x = Wind, y = Ozone)) +
geom_point(na.rm = TRUE) +
geom_smooth(na.rm = TRUE)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

What is the answer?
You can use a label instead of a dot for the locations in this two-dimensional scatter plot by adding the label aesthetic and using geom_text.
ggplot(data = airquality,
mapping = aes(x = Wind, y = Ozone, label = Ozone)) +
geom_text(na.rm = TRUE)
To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside aes().
You can make the plot interactive by using the ggplotly() function from the {plotly} package. You simply put the above code inside this function.
plotly::ggplotly(
ggplot(data = airquality,
mapping = aes(x = Temp, y = Ozone)) +
geom_point(na.rm = TRUE) +
geom_smooth(na.rm = TRUE)
)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Hover/zoom etc.
As another example, consider the Palmer penguin data set from https://education.rstudio.com/blog/2020/07/palmerpenguins-cran/.
The data are located on the web at the following URL. You first save the location as an object called loc.
loc <- "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv"Note that this object is now located in our environment. It is simply a string of characters (letters, backslashes, etc) in quotes. A character object.
Next you get the data and save it as an object called penguins with the read_csv() function from the {readr} package. Inside the parentheses of the function you put the name of the location.
penguins <- readr::read_csv(loc)## Rows: 344 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Note that the object penguins is now in your environment. It is a data frame containing 344 rows (observations) and 8 variables. You list the first 10 rows and 7 columns by typing the name of the object as follows.
penguins## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 2 more variables: sex <chr>, year <dbl>
The data are 344 individual penguins each described by species (Adelie, Chinstrap, Gentoo), where it was found (island name), length of bill (mm), depth of bill (mm), body mass (g), male or female, and year.
Each penguin belongs to one of three species. To see how many of the 344 penguins are in each species you use the table() function. Between the parentheses of this function you put the name of the data penguins followed by the $ sign followed by the name of the column species.
table(penguins$species)##
## Adelie Chinstrap Gentoo
## 152 68 124
Said another way, you reference columns in the data with the $ sign so that penguins$species is how you refer to the column species in the data object named penguins.
There are 152 Adelie, 68 Chinstrap, and 124 Gentoo penguins.
You plot the relationship between flipper length and body mass for each of the three species.
ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point() ## Warning: Removed 2 rows containing missing values (geom_point).

Penguin flipper length and body mass show a positive relationship (association). Penguins with longer flippers tend to be larger.
How does this positive relationship vary by species?
You answer this question with another aesthetic. You assign a level of the aesthetic (here a color) to each unique value of the variable, a process known as scaling. The ggplot() function also adds a legend that explains which levels correspond to which values.
ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point() +
scale_color_manual(values = c("darkorange","darkorchid","cyan4")) ## Warning: Removed 2 rows containing missing values (geom_point).

Returning to the mpg data set from last time.
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = class)) +
geom_point()
The colors reveal that the unusual points (on the right side of the plot) are two-seaters. Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage. In hindsight, these cars were unlikely to be hybrids since they have large engines.
In the above example, you mapped class to the color aesthetic, but you could have mapped class to the shape aesthetic, which controls point shapes.
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, shape = class)) +
geom_point() +
geom_smooth(method = lm, se = FALSE)## `geom_smooth()` using formula 'y ~ x'
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 7. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 62 rows containing missing values (geom_point).

What happened to the SUVs? The ggplot() function will only use six shapes at a time. By default, additional groups will go un-plotted when you use the shape aesthetic.
For each aesthetic, you use aes() to associate the name of the aesthetic with a variable to display. The aes() function gathers together each of the aesthetic mappings used by a layer and passes them to the layer’s mapping argument.
The syntax highlights a useful insight about x and y: the x and y locations of a point are themselves aesthetics, visual properties that you can map to variables to display information about the data.
You can also set the aesthetic properties of your geom manually. For example, you can make all of the points in our plot blue.
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point(color = "blue")
Here, the color doesn’t convey information about a variable, but only changes the appearance of the plot. To set an aesthetic manually, set the aesthetic by name as an argument of your geom function; i.e. it goes outside of aes(). You’ll need to pick a level that makes sense for that aesthetic:
- The name of a color as a character string (with quotes).
- The size of a point in millimeters.
- The shape of a point as a number, as shown below.
R has 25 shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–18) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.
Facets
Another way to add additional variables useful for categorical variables is to split the plot into facets. A facet is a subplot on one subset of the data.
A categorical variable is one that can take only a limited, and usually fixed, number of possible values so you can split the plot for each value of the categorical variable.
You can use facet_wrap() to create a faceted plot. The first argument of facet_wrap() is a formula, which you create with ~ followed by a variable name (here “formula” is the name of a data structure in R). The variable that you pass to facet_wrap() should only have a limited number of values (categorical).
The variable class in the data frame mpg is a character string. You can see this by typing
str(mpg)## tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
## $ model : chr [1:234] "a4" "a4" "a4" "a4" ...
## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr [1:234] "f" "f" "f" "f" ...
## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr [1:234] "p" "p" "p" "p" ...
## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
There are seven car classes. You put class in the facet_wrap() function. Everything is the same as before on the first two code lines but you add the facet_wrap() function.
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ class, nrow = 2) 
The output produces separate scatter plots one for each of the seven classes. More on graphs later.
Calculations
Let’s see how you can do some arithmetic in R.
R evaluates commands typed at the prompt and returns the result to the screen. The prompt is the blue greater than symbol (>). To find the sum of the square root of 25 and 2, at the prompt type
sqrt(25) + 2## [1] 7
The number inside the brackets indexes the output. Here there is only one bit of output, the answer 7. The prompt that follows indicates R is ready for another command.
12/3 - 5## [1] -1
How would you calculate the 5th power of 2? How would you find the product of 10.3 & -2.9? How would you find the average of 8.3 and 10.2?
How about 4.5% of 12,000?
.045 * 12000 ## [1] 540
Functions
Many math and statistical functions are available. A function has a name followed by a pair of parentheses. Arguments are placed inside the parentheses as needed.
For example,
sqrt(2)## [1] 1.414214
sin(pi)## [1] 1.224647e-16
How do you interpret this output? Type (highlight then click Run): .0000000000000001224647
Why not zero? What does the e-16 mean?
exp(1)## [1] 2.718282
log(10)## [1] 2.302585
Many functions have arguments with default values. For example, you only need to tell the random number generator rnorm() how many numbers to produce. The default mean is zero. To replace the default value, specify the corresponding argument name.
rnorm(10)## [1] 1.01202934 0.15811837 0.49029521 -0.09816279 0.24202958 1.73954980
## [7] 0.32049056 -1.65061001 1.04496800 -2.20580096
rnorm(10, mean = 5)## [1] 4.547750 3.858727 5.942633 4.396579 5.421752 4.869154 4.166234 5.196839
## [9] 5.835622 5.205494
Syntax is important
You get an error message when you type a function that R does not understand. For example:
squareroot(2)Error: could not find function “squareroot”
sqrt 2Error: syntax error
sqrt(-2)## Warning in sqrt(-2): NaNs produced
## [1] NaN
sqrt(2The last command shows what happens if R encounters a line that is not complete. The continuation prompt (+) is printed, indicating you did not finish the command.
Saving an object
Use the assignment operator to save an object. You put a name on the left-hand side of the left pointing arrow (<-) and the value on the right. Assignments do not produce output.
x <- 2
x + 3 ## [1] 5
x <- 10Here you assigned x to be a numeric object. Assignments are made using the left-pointing arrow (less than followed by a dash) [or an equal sign.]
Object names
You are free to make object names out of letters, numbers, and the dot or underline characters. A name starts with a letter or a dot (a leading dot may not be followed by a number). But you can’t use mathematical operators, such as +, -, *, and /.
Some examples of names include:
x <- 2
n <- 25
a.long.number <- 123456789
ASmallNumber <- .001Case matters. DF is different than df or Df.
Some names are commonly used to represent certain types of data. For instance, n is for length; x or y are data vectors; and i and j are integers and indices.
These conventions are not forced, but consistent use of them makes it easier for you (and others) to understand what you’ve done.
Entering data
The c() function is useful for getting a small amount of data into R. The function combines (concatenates) items (elements). Example: consider a set of hypothetical annual land falling hurricane counts over a ten-year period.
2 3 0 3 1 0 0 1 2 1
To enter these into your environment, type
counts <- c(2, 3, 0, 3, 1, 0, 0, 1, 2, 1)
counts## [1] 2 3 0 3 1 0 0 1 2 1
Notice a few things. You assigned the values to an object called counts. The assignment operator is an equal sign (=). Values do not print. They are assigned to an object name.
They are printed by typing the object name as you did on the second line. Finally, the values when printed are prefaced with a [1]. This indicates that the object is a vector and the first entry in the vector is a value of 2 (The number immediately to the right of [1]). More on this later.
You can save some typing by using the arrow keys to retrieve previous commands. Each command is stored in a history file and the up arrow key will move backwards through the history file and the down arrow forwards. The left and right arrow keys will work as expected.
Applying a function
Once the data are stored in an object, you use functions on them. R comes with all sorts of functions that you can apply to your counts data.
sum(counts)## [1] 13
length(counts)## [1] 10
sum(counts)/length(counts)## [1] 1.3
For this example, the sum() function returns the total number of hurricanes making landfall. The length() function returns the number of years, and sum(counts)/length(counts) returns the average number of hurricanes per year.
Other useful functions include, sort(), min(), max(), range(), diff(), and cumsum(). Try these on the landfall counts. What does range() do? What does diff() do?
Averge
The average (or mean) value of a set of numbers (\(x\)’s) is defined as:
\[
\bar x = \frac{x_1 + x_2 + \cdots + x_n}{n}
\]
The function mean() makes this calculation on your set of counts.
mean(counts)## [1] 1.3
Data vectors
The count data is stored as a vector. R keeps track of the order that the data were entered. First element,second element, and so on. This is good for a couple of reasons. Here the data has a natural order - year 1, year 2, etc. You don’t want to mix these. You would like to be able to make changes to the data item by item instead of entering the entire data again. Also vectors are math objects making them easy to manipulate.
Suppose counts contain the annual number of land-falling hurricanes from the first decade of a longer record. You want to keep track of counts over other decades. This could be done by the following, example.
cD1 <- counts
cD2 <- c(0, 5, 4, 2, 3, 0, 3, 3, 2, 1) Note that you make a copy of the first decade of counts and save the vector using a different object name.
Most functions operate on each element of the data vector at the same time.
cD1 + cD2## [1] 2 8 4 5 4 0 3 4 4 2
The first year of the first decade is added from the first year of the second decade and so on.
What happens if you apply the c() function to these two vectors?
c(cD1, cD2)## [1] 2 3 0 3 1 0 0 1 2 1 0 5 4 2 3 0 3 3 2 1
If you are interested in each year’s count as a difference from the decade mean, you type:
cD1 - mean(cD1)## [1] 0.7 1.7 -1.3 1.7 -0.3 -1.3 -1.3 -0.3 0.7 -0.3
In this case a single number (the mean of the first decade) is subtracted from each element of the vector of counts.
This is an example of data recycling. R repeats values from one vector so that its length matches the other vector. Here the mean is repeated 10 times.
Variance
Suppose you are interested in the variance of the set of landfall counts. The formula is given by: \[ \hbox{var}(x) = \frac{(x_1 - \bar x)^2 + (x_2 - \bar x)^2 + \cdots + (x_n - \bar x)^2}{n-1} \]
Note: The formula is given as LaTeX math code with the double dollar signs starting (and ending) the math mode. It’s a bit hard to read but it translates exactly to math as you would read it in a scientific article or textbook. Look at the HTML file.
Although the var() function will compute this for you, here you see how you could do this directly using the vectorization of functions. The key is to find the squared differences and then add up the values.
The key is to find the squared differences and then add them up.
x <- cD1
xbar <- mean(x)
x - xbar## [1] 0.7 1.7 -1.3 1.7 -0.3 -1.3 -1.3 -0.3 0.7 -0.3
(x - xbar)^2## [1] 0.49 2.89 1.69 2.89 0.09 1.69 1.69 0.09 0.49 0.09
sum((x - xbar)^2)## [1] 12.1
n <- length(x)
n## [1] 10
sum((x - xbar)^2)/(n - 1)## [1] 1.344444
To verify type
var(x)## [1] 1.344444
Data vectors have a type
One restriction on data vectors is that all the values have the same type. This can be numeric, as in counts, character strings, as in
simpsons <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
simpsons## [1] "Homer" "Marge" "Bart" "Lisa" "Maggie"
Note that character strings are made with matching quotes, either double, ", or single, ’.
If you mix the type within a data vector, the data will be coerced into a common type, which is usually a character. Arithmetic operations do not work on characters.
Returning to the land falling hurricane counts.
cD1 <- c(2, 3, 0, 3, 1, 0, 0, 1, 2, 1)
cD2 <- c(0, 5, 4, 2, 3, 0, 3, 3, 2, 1)Now suppose the National Hurricane Center (NHC) reanalyzes a storm, and that the 6th year of the 2nd decade is a 1 rather than a 0 for the number of landfalls. In this case you type
cD2[6] <- 1 # assign the 6 year of the decade a value of 1 landfallThe assignment to the 6th entry in the vector cD2 is done by referencing the 6th entry of the vector with square brackets [].
It’s important to keep this in mind: Parentheses () are used for functions and square brackets [] are used to extract values from vectors (and arrays, lists, etc). REPEAT: [] are used to extract or subset values from vectors, data frames, matrices, etc.
cD2 #print out the values## [1] 0 5 4 2 3 1 3 3 2 1
cD2[2] # print the number of landfalls during year 2 of the second decade## [1] 5
cD2[4] # 4th year count## [1] 2
cD2[-4] # all but the 4th year## [1] 0 5 4 3 1 3 3 2 1
cD2[c(1, 3, 5, 7, 9)] # print the counts from the odd years## [1] 0 4 3 3 2
One way to remember how to use functions is to think of them as pets. They don’t come unless they are called by name (spelled properly). They have a mouth (parentheses) that likes to be fed (arguments), and they will complain if they are not feed properly.
Working smarter
R’s console keeps a history of your commands. The previous commands are accessed using the up and down arrow keys. Repeatedly pushing the up arrow will scroll backward through the history so you can reuse previous commands.
Many times you wish to change only a small part of a previous command, such as when a typo is made. With the arrow keys you can access the previous command then edit it as desired.
Thursday, September 1, 2022
Today
- Data as vectors
- Sample statistics
- Structured data
- Tables and summaries
Data as vectors
The c() function is used to get small amounts of data into R. The function combines (concatenates) items (elements). Example: consider a set of hypothetical annual land falling hurricane counts over a ten-year period.
2 3 0 3 1 0 0 1 2 1
To save these values in our environment as a data object, type
counts <- c(2, 3, 0, 3, 1, 0, 0, 1, 2, 1)
counts## [1] 2 3 0 3 1 0 0 1 2 1
Once data are stored as an object, you use functions on them. Some common functions used on simple data objects include
sum(counts)## [1] 13
length(counts)## [1] 10
sum(counts)/length(counts)## [1] 1.3
For this example, the sum() function returns the total number of hurricanes making landfall. The length() function returns the number of years, and sum(counts)/length(counts) returns the average number of hurricanes per year.
Mean
The average (or mean) value of a set of numbers (\(x\)’s) is defined as: \[ \bar x = \frac{x_1 + x_2 + \cdots + x_n}{n} \]
Note: The formula is given as LaTeX math code with the double dollar signs starting (and ending) the math mode. It’s a bit hard to read but it translates exactly to math as you would read in a scientific article or textbook.
The function mean() makes this calculation on your set of counts.
mean(counts)## [1] 1.3
The counts data is stored as a vector. R keeps track of the order that the data were entered. First element, second element, and so on. This is good for a couple of reasons. Here the data have a natural order - year 1, year 2, etc. You don’t want to mix these. You would like to be able to make changes to the data item by item instead of entering the entire data again. Also vectors are math objects making them easy to manipulate.
Suppose counts contain the annual number of land-falling hurricanes from the first decade of a longer record. You want to keep track of counts over other decades. This could be done by the following, example.
cD1 <- counts
cD2 <- c(0, 5, 4, 2, 3, 0, 3, 3, 2, 1)Note that you make a duplicate copy of the vector called counts giving it a different name.
Most functions operate on each element of the data vector at the same time.
cD1 + cD2## [1] 2 8 4 5 4 0 3 4 4 2
The first year of the first decade is added to the first year of the second decade and so on.
What happens if you apply the c() function to these two vectors?
c(cD1, cD2)## [1] 2 3 0 3 1 0 0 1 2 1 0 5 4 2 3 0 3 3 2 1
If you are interested in each year’s count as a difference from the decade mean, you type:
cD1 - mean(cD1)## [1] 0.7 1.7 -1.3 1.7 -0.3 -1.3 -1.3 -0.3 0.7 -0.3
In this case a single number (the mean of the first decade) is subtracted from each element of the vector of counts.
This is an example of data recycling. R repeats values from one vector so that the length of this vector matches the other, longer vector. Here the mean is repeated 10 times.
Variance
Suppose you are interested in by how much the set of annual landfall counts varies from year to year. The formula for the variance is given by: \[ \hbox{var}(x) = \frac{(x_1 - \bar x)^2 + (x_2 - \bar x)^2 + \cdots + (x_n - \bar x)^2}{n-1} \]
Although the var() function will compute this, here you see how it can be computed from other simpler functions. The first step is to find the squared difference between each value and the mean. To simplify things first create a new vector x and assign the mean of the x’s to xbar.
x <- cD1
xbar <- mean(x)
x - xbar## [1] 0.7 1.7 -1.3 1.7 -0.3 -1.3 -1.3 -0.3 0.7 -0.3
(x - xbar)^2## [1] 0.49 2.89 1.69 2.89 0.09 1.69 1.69 0.09 0.49 0.09
The sum of the differences is zero, but not the sum of the squared differences.
sum((x - xbar)^2)## [1] 12.1
n <- length(x)
n## [1] 10
sum((x - xbar)^2)/(n - 1)## [1] 1.344444
So the variance is 1.344. To verify with the var() function type
var(x)## [1] 1.344444
Median
Recall that the mean is a statistic calculated on our data. Typically there are more data values close to the mean than far from it. A normal random variable is within two standard deviations of its mean about 95% of the time.
The median is a statistic defined exactly as the middle value.
For example, consider a set of seven data values. Here the seven values are generated randomly. The set.seed() function guarantees that everyone (with a particular seed number) will get the same set of values.
set.seed(3043)
y <- rnorm(n = 7)
sort(y)## [1] -1.855028975 -1.536523195 -1.113848013 -0.863720993 -0.813241685
## [6] 0.002064746 1.024752099
The argument value n = 7 guarantees seven values. They are sorted from lowest on the left to highest on the right with the sort() function. The middle value is the fourth value from the left in the ordered list of data values.
median(y)## [1] -0.863721
The median divides the data set into the top half (50%) of the data values and the bottom half of the data values.
With an odd number of values, the median is the middle one; with an even number of values, the median is the average of the two middle values.
y <- rnorm(n = 8)
sort(y)## [1] -2.03716871 -1.32753574 -0.74852359 -0.62357212 0.07656504 0.50029011
## [7] 1.38629034 1.42971671
median(y)## [1] -0.2735035
You check to see this is true no matter what the values are or what even number of values you choose.
N = 20
y <- rnorm(n = N)
y_sorted <- sort(y)
median(y) == (y_sorted[N/2] + y_sorted[N/2 + 1]) / 2## [1] TRUE
The median value, as a statistic representing the middle of a set of data values, is said to be resistant to extreme values (outliers).
Consider the wealth (in 1000s of $) of five bar patrons.
patrons <- c(50, 60, 100, 75, 200)Now consider the same bar and patrons after a multimillionaire walks in.
patrons_with_mm <- c(patrons, 50000)mean(patrons)## [1] 97
mean(patrons_with_mm)## [1] 8414.167
median(patrons)## [1] 75
median(patrons_with_mm)## [1] 87.5
The difference in the mean wealth with and without the millionaire present is substantial while the difference in median wealth with and without the millionaire is small.
Statistics that are not greatly influenced be a few values far from the bulk of the data are called resistant.
The cfb data set from the {UsingR} package has data from the Survey of Consumer Finances conducted by the U.S. Federal Reserve Board (in 2001). Some of the income values are much higher than the bulk of the data. This tendency is common in income distributions. A few people tend to accumulate enormous wealth.
Make the data available with the library() function, then show the first ten rows and ten columns by typing the name of the data object (cfb).
library(UsingR)## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: HistData
## Loading required package: Hmisc
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'UsingR'
## The following object is masked from 'package:survival':
##
## cancer
cfb## WGT AGE EDUC INCOME CHECKING SAVING NMMF STOCKS
## X17470 5749.9746 54 14 66814.1946 6000 2000 0 500
## X315 5870.6340 40 12 42144.3381 400 0 0 0
## X8795 8043.6950 35 14 25697.7671 1000 160 0 0
## X10720 6092.8720 55 12 35976.8740 2600 19100 0 0
## X19170 7161.7566 40 12 39060.6061 1000 8300 0 3500
## X22075 11429.6335 82 12 13362.8389 1000 0 50000 0
## X12235 5988.0417 26 16 61674.6411 3000 0 0 0
## X7670 7111.7751 50 14 53451.3557 3100 0 0 0
## X16555 7602.8631 71 12 16446.5710 1000 0 0 0
## X370 9917.0148 70 6 9867.9426 50 0 0 0
## X7680 7263.7921 52 12 35976.8740 1700 3000 2000 0
## X6880 7039.9174 53 11 7195.3748 0 0 0 0
## X16570 6523.7932 27 16 78121.2121 8500 8000 1100 1500
## X12945 6490.4551 27 12 28781.4992 0 0 4000 0
## X6725 8265.3192 69 12 12334.9282 500 1600 0 0
## X15725 1616.6743 55 17 459476.0766 0 0 0 0
## X19880 6805.1027 42 14 54479.2663 3200 55 0 0
## X225 6865.3880 73 12 43172.2488 11000 2000 100000 0
## X4995 7731.3206 76 12 69897.9266 296440 0 0 0
## X7700 5693.9061 43 12 58590.9091 750 1700 13000 0
## X11375 6660.1557 48 11 52423.4450 0 1200 17600 0
## X17920 6764.6424 57 12 25697.7671 1600 900 0 0
## X12365 5591.7642 44 16 51395.5343 590 1780 0 14000
## X920 5812.9110 44 15 87372.4083 300 32700 0 0
## X19050 1022.1029 59 13 59618.8198 1500 0 0 75000
## X19555 8909.1588 47 14 25697.7671 17320 730 0 0
## X10520 4336.5281 25 14 26725.6778 800 1500 0 0
## X18705 8691.5555 28 16 71953.7480 4020 11830 0 0
## X5095 7620.1135 74 12 48311.8022 3500 0 0 0
## X11010 7683.5398 62 11 6475.8373 0 250 0 0
## X3540 10144.6672 23 12 28781.4992 420 340 0 0
## X14950 7328.9577 40 14 71953.7480 20800 0 0 0
## X4830 7069.5583 44 13 3700.4785 350 1000 0 0
## X2865 10911.3427 65 11 26725.6778 7000 6000 7500 22000
## X20945 6415.1554 35 17 54479.2663 1200 9310 0 0
## X13040 5263.6488 40 13 66814.1946 0 380 0 0
## X4515 5360.7266 33 11 28781.4992 500 0 0 0
## X145 5696.8902 21 11 513.9553 20 20 0 0
## X18685 8417.3121 63 13 41116.4274 180 0 0 0
## X17585 6373.6917 52 17 57562.9984 1000 0 0 0
## X10090 5114.4060 24 14 28781.4992 0 0 0 0
## X13235 5454.0787 29 14 9251.1962 2000 20000 0 0
## X3045 5454.0787 46 14 92511.9617 2500 1500 88000 0
## X21425 5696.2367 38 12 11307.0175 50 40 0 0
## X11840 5361.2218 34 13 7400.9569 0 0 0 0
## X3400 6327.2872 47 12 30837.3206 3500 0 0 0
## X6635 7173.3284 49 14 37004.7847 3200 2300 0 0
## X19815 6188.2375 83 12 25697.7671 5100 1800 0 0
## X19565 5788.8378 50 16 25697.7671 1500 350 0 0
## X12135 7998.0705 68 16 104846.8899 14600 7100 0 35000
## X10700 6501.2709 83 14 58590.9091 0 0 330000 275000
## X2600 7956.8927 28 17 61674.6411 6000 0 20000 0
## X2860 6604.7905 45 12 20558.2137 1500 530 0 0
## X2175 4522.0593 57 11 33921.0526 600 3000 0 0
## X14915 9185.1147 50 12 169605.2632 3500 0 0 0
## X66351 7173.3284 49 14 37004.7847 3200 2300 0 0
## X6575 1688.0257 40 14 153158.6922 2000 0 6000 0
## X8410 6793.3807 29 13 15418.6603 3500 3000 0 0
## X7230 5859.0521 59 15 15418.6603 2000 1850 200000 0
## X12955 10373.1531 69 6 12334.9282 1000 0 0 0
## X19205 7691.5051 44 12 53451.3557 1000 1050 0 0
## X600 5976.9863 59 12 38032.6954 990 1200 0 0
## X1290 6655.6238 22 14 15418.6603 250 0 0 0
## X17070 243.6350 40 16 925119.6172 22000 0 275000 175000
## X16140 6677.0208 68 14 44200.1595 22000 50000 0 0
## X17935 9636.8011 52 16 81204.9442 0 0 0 0
## X3605 5198.3414 39 16 47283.8915 4000 3000 25000 0
## X10275 5933.3748 43 16 144935.4067 7700 15300 17000 17000
## X19930 7944.1474 28 13 75037.4801 3200 50 0 500
## X15360 7421.8016 40 12 123349.2823 1000 6000 50000 20000
## X1075 7485.5250 77 12 12334.9282 1700 12390 43000 0
## X7770 9527.8477 78 16 34948.9633 2000 100 0 0
## X1010 6341.0975 58 8 12334.9282 1500 1200 0 0
## X7095 4293.7517 37 14 14390.7496 660 0 0 0
## X14255 7427.1703 78 12 11307.0175 2200 0 0 0
## X20075 10164.9687 56 14 37004.7847 1000 0 0 0
## X2610 5551.9820 28 12 28781.4992 270 300 0 0
## X965 5837.2792 83 12 14390.7496 600 20000 0 0
## X17515 6220.5890 48 16 41116.4274 2000 5000 0 0
## X1755 6270.0639 24 9 8223.2855 0 0 0 0
## X16440 11386.7530 57 13 113070.1754 0 3500 0 2000
## X14750 7029.1679 29 16 37004.7847 400 100 0 0
## X16960 8067.4672 36 17 63730.4625 2000 0 40000 20000
## X575 5111.3136 24 11 24669.8565 0 0 0 0
## X12340 7216.5318 79 16 25697.7671 7500 0 28000 190000
## X3250 5516.1522 40 17 12334.9282 300 0 0 0
## X21805 3597.7161 51 16 81204.9442 5000 0 50000 30000
## X17860 2751.6615 49 12 8223.2855 14600 500 0 0
## X6260 3036.6357 44 14 177828.5486 0 0 50000 85000
## X8435 4689.7790 25 15 40088.5167 160 0 0 0
## X10795 6313.4185 55 17 87372.4083 1500 96000 27000 0
## X9785 6018.8547 48 6 35976.8740 2850 0 0 0
## X17455 8340.9656 40 14 46255.9809 4000 0 0 0
## X11275 10483.6685 68 12 27753.5885 3300 8000 32000 116000
## X6785 7596.6439 45 12 56535.0877 500 505 0 0
## X12920 6468.5210 25 14 4625.5981 0 0 0 0
## X12685 6937.5423 50 17 50367.6236 1800 1850 0 0
## X7575 5875.4599 67 14 66814.1946 200 0 0 0
## X16745 8034.5602 30 14 52423.4450 2000 0 0 0
## X3925 6698.1550 28 12 37004.7847 0 0 0 0
## X13715 7485.8803 21 15 11307.0175 800 340 0 0
## X2630 6623.7739 31 16 149047.0494 2000 2300 0 1700
## X1880 7673.4807 42 17 12334.9282 220 0 0 0
## X16810 5375.4516 29 14 38032.6954 20 0 0 0
## X7535 5532.8460 23 14 25697.7671 1200 1 0 0
## X17395 4448.8961 36 13 31865.2313 0 0 0 0
## X20265 4733.4575 40 16 28781.4992 820 400 0 18000
## X16645 6010.7120 58 13 113070.1754 6030 10000 0 0
## X18180 4583.3587 52 12 117181.8182 5400 21000 250000 0
## X4825 5070.4577 38 13 33921.0526 0 0 0 0
## X1845 8154.7752 78 16 64758.3732 3500 1500 0 26000
## X5425 10038.8263 40 12 113070.1754 3000 1500 0 8300
## X10600 8502.3051 68 14 61674.6411 4000 36000 0 0
## X10360 8298.7768 68 11 10279.1069 0 0 0 0
## X19890 4456.3079 27 13 16446.5710 750 1550 0 17000
## X20500 8349.2691 26 10 19530.3030 0 0 0 0
## X2565 6641.8552 42 16 88400.3190 2000 44000 100000 20000
## X26002 7956.8927 28 17 61674.6411 6000 0 20000 0
## X19845 4405.0395 26 12 38032.6954 140 0 0 0
## X18965 8152.5724 59 12 43172.2488 9500 7700 0 0
## X11230 3934.7121 28 11 15418.6603 0 0 0 0
## X11260 7423.0858 75 14 67842.1053 3500 3400 45000 122000
## X3200 7098.8499 42 12 98679.4258 1600 13040 1000 600
## X5965 5871.1832 37 16 41116.4274 470 600 1300 0
## X107953 6313.4185 55 17 87372.4083 1500 96000 27000 0
## X11035 9078.7938 85 12 20558.2137 0 0 0 0
## X18245 5659.8661 54 17 91484.0510 7000 8200 0 0
## X11955 7244.9139 46 12 16446.5710 10000 10000 0 0
## X9345 6726.9283 92 9 3392.1053 80 0 0 0
## X2320 6434.5102 49 14 71953.7480 1300 1600 0 0
## X9295 1158.4185 71 16 65786.2839 3000 400 0 0
## X20110 5731.2341 48 14 51395.5343 5900 11000 0 0
## X680 6833.6584 48 16 100735.2472 3800 0 13000 0
## X13270 7537.6703 37 17 82232.8549 3200 1950 0 0
## X3075 7190.2136 42 17 40088.5167 200 2150 0 0
## X13160 9388.0984 42 17 61674.6411 15000 100 0 0
## X20435 3133.2430 35 17 1182097.2887 0 0 0 375000
## X12465 2146.5932 52 16 51395.5343 18000 40000 0 600000
## X4440 4599.0191 60 12 8223.2855 660 100 0 0
## X3870 7560.2604 63 12 29809.4099 1500 0 0 0
## X3510 6655.2299 40 12 80177.0335 1000 0 0 0
## X13795 6664.1853 18 10 7812.1212 0 0 0 0
## X18155 4538.6282 50 16 61674.6411 500 500 0 0
## X4685 7123.2132 57 16 51395.5343 1000 30000 0 5700
## X20135 4921.4820 44 12 29809.4099 470 300 0 0
## X7975 10857.6915 77 8 15418.6603 0 40000 0 0
## X16425 6688.8349 53 11 82232.8549 2000 300 0 0
## X84354 4689.7790 25 15 40088.5167 160 0 0 0
## X12905 7233.3450 62 2 71953.7480 1000 500 0 2500
## X15095 7819.0561 86 14 7195.3748 1010 132000 0 0
## X3625 7581.8314 34 11 6989.7927 0 0 0 0
## X198455 4405.0395 26 12 38032.6954 140 0 0 0
## X570 10431.8465 47 12 45228.0702 500 1 0 0
## X21195 6578.5191 74 16 38032.6954 5200 0 190 50000
## X16470 3597.7161 43 12 1408237.6396 10 0 0 0
## X14880 5711.2392 52 12 94567.7831 2150 2610 0 0
## X9485 8780.1580 55 10 1439.0750 5 400 0 0
## X17090 5797.9275 33 12 10279.1069 700 0 0 0
## X9670 11386.7530 45 16 92511.9617 5500 0 0 0
## X15945 4792.5122 44 12 10279.1069 0 0 0 0
## X13535 5532.8460 23 10 29809.4099 200 200 0 0
## X3685 7486.2704 48 10 16446.5710 0 0 0 0
## X540 6746.5369 45 12 40088.5167 750 180 0 0
## X17780 6655.8875 51 16 71953.7480 500 8000 0 400
## X21100 3253.9699 49 14 90456.1404 0 10000 100000 300000
## X4310 9939.8329 34 17 47283.8915 1500 0 0 0
## X2010 8301.2131 49 12 38032.6954 500 700 0 0
## X8785 6388.3726 55 12 48311.8022 2000 0 0 0
## X1045 7700.3724 56 9 34948.9633 500 5000 0 3000
## X2935 8045.5847 76 12 24669.8565 1500 0 0 0
## X11195 7192.3659 21 12 46255.9809 0 0 0 120000
## X110356 9078.7938 85 12 20558.2137 0 0 0 0
## X3410 6611.1226 50 16 67842.1053 10 700 0 1
## X17765 6235.1707 83 15 7812.1212 2000 0 0 0
## X9175 3265.4868 46 16 71953.7480 1000 1200 10000 0
## X6395 5644.9880 28 12 26725.6778 0 300 0 5700
## X485 5154.0603 49 16 19530.3030 0 0 0 0
## X870 1173.9354 40 16 80177.0335 1640 4100 0 1700
## X9220 4897.5131 37 9 12334.9282 40 0 0 0
## X1920 7487.6105 63 12 12334.9282 400 0 0 0
## X19230 8742.7099 63 15 27753.5885 1850 0 80000 75000
## X18475 2133.9750 67 17 53451.3557 4000 0 421000 375000
## X5895 5446.1083 45 7 17474.4817 0 520 0 0
## X3695 10109.2136 49 17 76065.3908 2000 500 0 0
## X17075 7726.8088 31 17 63730.4625 0 5000 0 0
## X21685 6899.7143 37 13 21586.1244 800 0 0 0
## X10410 5134.3240 25 16 50367.6236 0 300 0 0
## X1350 5540.6097 31 13 12334.9282 0 0 0 0
## X18760 5988.3062 43 17 71953.7480 500 0 0 0
## X3405 5303.8926 27 16 26725.6778 510 15 0 0
## X12035 5803.8741 35 12 46255.9809 770 0 0 0
## X305 6313.9774 47 14 68870.0159 2000 16100 5500 0
## X17850 7666.5600 72 12 185023.9234 3500 0 0 0
## X4110 1503.1836 38 16 1541866.0287 0 0 1530000 300000
## X4605 6478.4991 62 12 19530.3030 1800 15000 0 0
## X12555 4686.2076 25 11 26725.6778 0 0 0 0
## X5915 3330.3623 54 17 332015.1515 15000 23500 125000 0
## X22035 4823.1376 58 1 6578.6284 0 0 0 0
## X6930 5808.7163 31 12 58590.9091 12000 16500 0 5500
## X17060 10597.7984 80 10 23641.9458 4800 0 0 0
## X13760 6133.1493 57 12 53451.3557 50 18700 0 0
## X5825 6661.3144 56 16 31865.2313 1300 0 0 0
## X34057 5303.8926 27 16 26725.6778 510 15 0 0
## X20180 8410.7240 61 15 101763.1579 450 0 0 0
## X21130 11097.5342 78 12 15418.6603 310 0 0 0
## X12205 4681.8403 46 14 12334.9282 0 0 0 0
## X1265 9929.1222 77 12 25697.7671 0 0 0 0
## X13645 10246.9474 81 12 44200.1595 51000 0 0 0
## X905 7456.1503 23 11 12334.9282 700 0 0 0
## X21995 5929.3158 83 12 10279.1069 300 6000 0 0
## X6975 9338.9337 78 16 24669.8565 0 1100 0 0
## X16450 5872.4153 40 12 22614.0351 2500 120 0 0
## X14840 5671.0347 80 13 15418.6603 3700 3300 0 0
## X8300 6136.6248 46 14 52423.4450 1700 6150 0 0
## X645 2797.2649 52 17 192219.2982 2000 2000 0 0
## X2770 7022.5454 62 16 47283.8915 2700 2000 0 0
## X147508 7029.1679 29 16 37004.7847 400 100 0 0
## X1540 6385.0040 35 11 40088.5167 320 3240 0 0
## X19435 5019.3357 27 11 5961.8820 0 0 0 0
## X6765 9419.1196 72 16 30837.3206 400 0 0 0
## X54259 10038.8263 40 12 113070.1754 3000 1500 0 8300
## X19980 7630.0979 86 17 37004.7847 10000 20000 0 0
## X54010 6746.5369 45 12 40088.5167 750 180 0 0
## X21890 6316.2726 39 12 87372.4083 1000 15650 0 0
## X1220 8765.8772 76 8 23641.9458 0 18000 0 0
## X16615 837.3098 46 16 153158.6922 5000 0 0 750000
## X16905 11386.7530 76 16 28781.4992 5600 6800 48000 100000
## X9050 1101.0772 46 17 223056.6188 0 0 80000 14000
## X21165 5386.4622 40 14 71953.7480 0 110000 135000 0
## X16350 5073.3726 26 11 5653.5088 0 0 0 0
## X14085 5169.3498 56 14 35976.8740 200 0 0 0
## X11465 5134.4672 54 12 68870.0159 400 900 0 0
## X12610 1725.2995 60 14 20558.2137 20000 0 0 0
## X785 5496.3173 24 16 30837.3206 1000 0 0 0
## X14485 6354.0137 45 13 17474.4817 1500 360 0 0
## X8580 7333.3380 40 12 39060.6061 1000 0 0 0
## X10340 6355.6933 25 14 67842.1053 500 0 0 0
## X20855 5483.6654 75 8 6270.2552 200 450 0 0
## X5420 7143.7855 43 15 58590.9091 50 1960 1500 0
## X1200 7770.1210 49 12 66814.1946 1200 4900 0 0
## X13395 6239.6906 29 11 125405.1037 2000 7000 0 6000
## X10230 7426.1415 49 14 30837.3206 0 10000 0 0
## X17945 10038.8263 39 17 37004.7847 700 580 0 0
## X565 6382.7943 47 12 10279.1069 300 0 0 0
## X18070 7659.5207 88 7 16446.5710 900 2700 0 2000
## X509511 7620.1135 74 12 48311.8022 3500 0 0 0
## X8940 5120.7298 43 12 54479.2663 3500 15000 0 0
## X11575 9604.9903 66 17 43172.2488 1400 0 0 3600
## X1213512 7998.0705 68 16 104846.8899 14600 7100 0 35000
## X14770 8130.4052 44 14 96623.6045 1800 3800 0 0
## X22015 6105.5579 54 14 14390.7496 1000 1300 0 0
## X4965 5025.3417 49 10 0.0000 0 0 0 0
## X1660 8149.6942 44 13 62702.5518 2500 6000 0 0
## X20795 6880.3630 19 13 19530.3030 470 40 0 4000
## X2045 7719.5388 21 13 35976.8740 2000 0 0 0
## X10235 9916.6980 85 16 24669.8565 1 0 0 180000
## X12060 7335.8692 23 12 22614.0351 100 120 0 0
## X5680 8288.4412 57 14 129516.7464 4900 5000 0 0
## X20215 5133.1927 52 12 19530.3030 2250 410 0 0
## X15375 7898.4771 80 6 17474.4817 0 2500 0 0
## X10740 9507.0043 78 5 7709.3301 0 0 0 0
## X4160 4372.7256 68 16 76065.3908 7000 0 163000 112000
## X310 5950.2488 40 12 21586.1244 2000 500 0 0
## X3235 6509.0382 57 12 51395.5343 700 2200 0 0
## X21055 8250.0749 67 12 12334.9282 10220 0 0 0
## X2620 5284.0466 28 12 0.0000 0 0 0 0
## X1600 4660.6242 37 17 75037.4801 2500 220 0 0
## X1751513 6220.5890 48 16 41116.4274 2000 5000 0 0
## X5765 6225.7422 59 14 41116.4274 0 0 0 0
## X16945 6440.1730 79 12 39060.6061 27000 15000 0 0
## X20830 236.7943 57 17 429666.6667 0 0 150000 0
## X10105 10483.6685 65 17 35976.8740 0 0 0 9000
## X4895 8641.0258 36 12 30837.3206 400 700 0 0
## X9895 4920.1955 55 14 14390.7496 0 0 0 0
## X10650 7902.0620 48 12 19530.3030 100 0 0 0
## X8705 6661.6101 54 12 85316.5869 2500 500 0 0
## X1490 7291.4425 85 12 82232.8549 16000 0 0 0
## X341014 6611.1226 50 16 67842.1053 10 700 0 1
## X1408515 5169.3498 56 14 35976.8740 200 0 0 0
## X16235 7640.4959 28 13 106902.7113 1200 2420 0 0
## X2201516 6105.5579 54 14 14390.7496 1000 1300 0 0
## X17115 6487.3485 43 15 57562.9984 2500 4600 0 1000
## X22110 8414.4992 38 12 56535.0877 1500 1750 0 0
## X5075 8472.5536 78 11 75037.4801 8600 950 0 0
## X3895 6208.4334 54 16 83260.7655 16300 0 62700 56000
## X18550 6467.8824 41 14 152130.7815 150 0 0 0
## X1998017 7630.0979 86 17 37004.7847 10000 20000 0 0
## X10815 5992.9396 56 12 28781.4992 900 560 0 0
## X130 6832.8261 50 17 64758.3732 1100 14700 0 0
## X15700 9210.3635 42 17 115125.9968 5600 1900 0 0
## X10560 7091.0393 78 6 8223.2855 660 0 0 0
## X8180 7339.2602 55 12 15418.6603 0 0 0 0
## X6115 7434.1715 55 13 61674.6411 2000 1000 0 0
## X11495 9240.9040 44 16 81204.9442 5700 2200 0 8800
## X17710 8507.0410 77 9 23641.9458 13000 0 10000 80000
## X10510 7534.6363 19 12 5653.5088 0 0 0 0
## X10990 5651.0074 45 11 35976.8740 2400 0 0 1200
## X13300 5711.5575 59 12 16446.5710 100 70 0 0
## X19315 6143.5717 37 12 25697.7671 1600 5 0 0
## X10685 7444.5161 45 12 23641.9458 0 0 0 0
## X19330 6984.1467 48 14 71953.7480 2000 5600 0 15000
## X16260 6003.7896 46 10 6167.4641 0 10 0 0
## X13945 9197.4307 35 9 34948.9633 1500 5100 0 0
## X2330 6659.7322 46 17 20558.2137 1000 400 6000 40000
## X12080 5664.1469 20 14 21586.1244 700 0 0 0
## X16900 3237.7455 69 17 415275.9171 234400 0 0 1000000
## X1080 11386.7530 31 17 55507.1770 810 45000 0 0
## X19180 2812.2327 53 16 87372.4083 0 80350 236000 20000
## X2925 6746.5369 38 16 149047.0494 2000 15000 0 0
## X7555 7486.2469 62 12 35976.8740 7500 0 42000 0
## X16600 8180.4213 57 17 166521.5311 3000 2700 0 0
## X16795 7270.1531 37 12 65786.2839 400 50 0 0
## X16545 7485.1324 60 12 44200.1595 2000 1500 0 0
## X20245 6984.7124 25 12 56535.0877 2000 0 0 0
## X9180 6503.6143 32 17 87372.4083 0 5500 4000 50000
## X16480 9494.7847 76 14 69897.9266 1500 0 173000 100000
## X17355 11253.9904 57 12 66814.1946 1610 400 0 0
## X5875 7786.5650 80 10 8326.0766 6000 1000 0 0
## X16145 7168.1489 45 14 50367.6236 0 3000 0 2500
## X21770 5079.5219 62 8 7298.1659 0 0 0 0
## X11820 5689.9210 38 16 69897.9266 2501 1200 48000 0
## X9390 7393.9024 65 16 14390.7496 1000 3000 2000 2000
## X12520 6332.1609 67 10 14390.7496 6000 0 0 0
## X12040 11386.7530 76 16 37004.7847 2500 11000 0 38000
## X8890 10431.8465 50 11 114098.0861 15000 0 0 8300
## X150 6767.4428 29 13 47283.8915 100 0 0 7000
## X4600 7132.4826 66 7 34948.9633 1500 800 0 0
## X12490 7344.0563 70 12 13362.8389 40 0 0 0
## X5640 8473.5305 43 12 17474.4817 0 1700 0 0
## X1758518 6373.6917 52 17 57562.9984 1000 0 0 0
## X3105 1720.8497 54 17 444057.4163 0 0 3900000 100000
## X1070019 6501.2709 83 14 58590.9091 0 0 330000 275000
## X7035 8463.5537 49 16 54479.2663 0 0 0 0
## X19950 4181.7756 53 16 243614.8325 8300 3010 0 2300
## X12835 4692.9691 39 12 40088.5167 1500 15 0 0
## X12050 5651.0074 48 12 67842.1053 2500 2800 0 0
## X12605 7292.5295 48 14 116153.9075 2800 410 0 0
## X16605 10121.8582 74 14 67842.1053 6500 0 0 0
## X100 6078.5087 65 9 7914.9123 0 0 0 0
## X21095 6777.8412 51 14 76065.3908 600 0 0 0
## X9640 8069.9291 52 12 35976.8740 400 0 0 0
## X18820 1099.8573 50 17 248754.3860 237500 250000 0 2200
## X20565 11386.7530 40 17 193247.2089 5000 1000 0 23000
## X13035 8220.9304 83 14 81204.9442 7000 193000 0 0
## X15555 307.3822 48 16 176800.6380 5200 1000 650000 0
## X8315 6657.7220 69 16 51395.5343 1000 115000 0 10000
## X990 6273.6366 28 11 69897.9266 3000 0 0 0
## X19305 4792.5614 47 17 93539.8724 5400 0 0 15000
## X14165 8909.1588 54 14 69897.9266 2500 0 28000 62000
## X1785 9471.2816 63 12 21586.1244 2300 0 0 0
## X22045 3394.2432 68 12 80177.0335 2000 15000 0 25000
## X31520 5870.6340 40 12 42144.3381 400 0 0 0
## X21535 8618.8325 66 12 9765.1515 550 5600 0 0
## X8005 7407.0569 45 17 51395.5343 8000 0 0 0
## X21855 5619.8455 28 14 56535.0877 1600 1430 0 0
## X2965 8016.8139 54 16 92511.9617 1100 400 11000 10000
## X19925 8390.7838 38 12 35976.8740 3500 77000 63000 0
## X21305 7544.4668 35 5 10279.1069 0 0 0 0
## X11315 11291.1815 63 17 75037.4801 7200 6200 0 45000
## X2870 6938.1564 37 8 16446.5710 0 200 0 0
## X7845 5128.3639 24 9 37004.7847 200 0 0 0
## X11325 9377.8584 41 13 80177.0335 200 2200 0 300
## X7135 6633.8745 24 14 32893.1419 1700 0 600 0
## X1223521 5988.0417 26 16 61674.6411 3000 0 0 0
## X19955 5867.5632 55 12 43172.2488 7000 0 62000 0
## X12115 4344.4525 36 13 25697.7671 0 2000 0 0
## X20770 4390.4068 26 10 10279.1069 0 0 0 0
## X695 7345.0840 45 12 39060.6061 0 0 0 0
## X11320 10038.8263 41 16 128488.8357 5000 0 0 20000
## X7080 6071.8608 61 12 27753.5885 1300 0 330000 90000
## X21705 6325.1681 42 12 20558.2137 0 0 0 0
## X10855 7418.4265 50 14 102791.0686 5300 1700 0 0
## X6340 7390.7142 46 16 82232.8549 2000 100 0 0
## X17450 7513.2426 57 12 11307.0175 0 0 0 0
## X2895 8886.6097 83 12 13362.8389 1 1000 0 0
## X8115 8155.4622 78 17 98679.4258 7000 0 140000 0
## X430 4603.5133 35 14 88400.3190 810 6000 20000 0
## X1027522 5933.3748 43 16 144935.4067 7700 15300 17000 17000
## X387023 7560.2604 63 12 29809.4099 1500 0 0 0
## X13920 6330.8427 47 12 24669.8565 700 0 0 0
## X3160 4384.6392 46 13 33921.0526 300 0 0 0
## X19125 3276.5295 59 13 154186.6029 2700 0 0 10000
## X21480 1832.4461 60 14 431722.4880 220000 0 990000 156000
## X13180 4989.6261 47 12 32893.1419 750 0 0 0
## X7970 9005.8034 88 10 16446.5710 11000 0 0 0
## X11435 9493.2555 52 15 100735.2472 0 4100 0 0
## X16800 6750.1669 23 13 15418.6603 200 0 0 0
## X5575 6212.7090 86 12 18502.3923 100 0 0 0
## X9880 4017.0656 33 12 29809.4099 700 0 0 0
## X13440 5195.9143 53 8 9765.1515 10 0 0 0
## X17370 5668.8620 46 12 32893.1419 600 0 0 0
## X17200 10178.6915 76 12 26725.6778 1000 5000 0 0
## X3905 6746.5369 41 17 183996.0128 15800 18000 0 60000
## X14000 7172.7607 29 12 4214.4338 0 0 0 0
## X9710 5155.4648 54 12 15418.6603 1600 0 0 0
## X5300 6574.6565 76 12 42144.3381 8500 5200 86000 100000
## X12985 8867.1917 40 12 24669.8565 1200 0 0 0
## X2007524 10164.9687 56 14 37004.7847 1000 0 0 0
## X15575 6836.0441 39 9 32893.1419 2000 0 0 0
## X4245 8696.1124 67 12 20558.2137 400 0 0 0
## X21505 5686.0635 43 12 29809.4099 0 0 0 0
## X4215 7936.4547 22 15 8223.2855 0 30 0 0
## X12535 11794.4073 78 12 14390.7496 700 0 0 0
## X16475 9521.9200 75 10 10279.1069 6800 0 0 0
## X4570 6836.3888 53 13 10279.1069 50 0 0 0
## X15300 8124.9062 45 12 10279.1069 0 0 0 0
## X18200 6704.2828 40 12 121293.4609 810 1400 0 50000
## X2325 7803.2247 45 15 113070.1754 4900 2000 12000 1000
## X3430 4850.2392 61 2 35976.8740 0 1000 0 0
## X7495 6984.7124 32 10 12334.9282 300 0 0 0
## X489525 8641.0258 36 12 30837.3206 400 700 0 0
## X1680026 6750.1669 23 13 15418.6603 200 0 0 0
## X21375 7709.7569 42 14 101763.1579 1000 4201 1400 170000
## X11115 6525.0952 64 9 21586.1244 800 0 0 0
## X5220 5976.1950 26 16 30837.3206 6000 0 0 1000
## X1488027 5711.2392 52 12 94567.7831 2150 2610 0 0
## X21940 6131.7347 46 12 167549.4418 2530 0 7000 0
## X1364528 10246.9474 81 12 44200.1595 51000 0 0 0
## X21040 5845.6749 25 16 25697.7671 50 0 0 0
## X7125 6563.5801 49 13 31865.2313 0 600 0 0
## X8670 5196.5893 21 12 7195.3748 400 110 0 0
## X10640 6599.9746 47 12 136712.1212 1500 0 0 0
## X18375 3907.1193 26 12 29809.4099 1000 2500 0 0
## X20845 7247.0973 25 12 35976.8740 2400 205 0 2500
## X595 6578.6259 51 17 162409.8884 5000 27000 0 3000
## X1455 5851.6456 43 16 103818.9793 1000 0 8000 0
## X8760 4733.7759 81 4 4625.5981 0 0 0 0
## X626029 3036.6357 44 14 177828.5486 0 0 50000 85000
## X8475 11291.1815 53 17 137740.0319 0 2000 350000 150000
## X3085 6093.4654 30 14 54479.2663 300 2000 15000 0
## X9285 3597.7161 67 16 129516.7464 4000 70000 50000 300000
## X6940 7270.8124 49 13 48311.8022 1200 180 0 0
## X557530 6212.7090 86 12 18502.3923 100 0 0 0
## X16710 7779.0121 64 12 6167.4641 4500 0 0 0
## X15515 5222.0284 20 12 14390.7496 20 0 0 0
## X3530 4372.7256 48 16 102791.0686 3200 0 24000 0
## X6860 5487.6011 43 12 52423.4450 0 0 0 0
## X14630 4457.1985 34 12 31865.2313 1100 80 0 0
## X14705 9293.5925 76 14 28781.4992 27000 0 0 0
## X13010 4911.3897 30 12 70925.8373 8800 5000 0 0
## X1792031 6764.6424 57 12 25697.7671 1600 900 0 0
## X12705 5259.2512 65 11 25697.7671 2000 5000 0 0
## X9870 5335.8581 30 12 44200.1595 500 0 0 0
## X17305 6237.3654 32 15 55507.1770 1150 1500 0 0
## X21595 7445.0778 82 12 20558.2137 0 2000 0 2000
## X13725 7519.7320 55 11 27753.5885 0 700 0 4000
## X10040 7735.4996 88 13 11307.0175 0 0 0 0
## X7005 5575.5340 52 10 41116.4274 0 1000 0 0
## X3760 5696.2367 41 12 29809.4099 5000 120 0 0
## X14910 5933.3748 43 16 308373.2057 4000 100000 202000 0
## X13365 9269.3339 55 9 6270.2552 0 10 0 0
## X11410 5960.0579 33 16 119237.6396 2400 0 0 0
## X12220 2814.9047 36 16 855221.6906 60000 230000 350000 0
## X18420 7029.1679 31 13 80177.0335 1550 0 0 3000
## X9005 7842.3749 40 15 21586.1244 0 0 0 0
## X11855 7333.3380 38 12 12334.9282 200 0 0 7000
## X21405 3960.9948 55 11 30837.3206 0 0 0 0
## X21260 9441.6518 64 12 28781.4992 600 0 0 0
## X8020 4444.7730 55 17 8634.4498 0 500 0 0
## X10370 7938.8328 34 14 49339.7129 500 550 0 0
## X15255 7597.5728 73 3 23641.9458 3000 0 0 0
## X12735 5620.7655 43 12 72981.6587 1500 0 0 0
## X2635 8359.0939 58 17 57562.9984 2000 6000 0 0
## X4765 6452.8207 37 14 13362.8389 140 0 0 0
## X20295 6316.2726 38 16 82232.8549 500 200 3000 12000
## X4030 5316.4354 44 12 1233.4928 0 0 0 0
## X21360 9418.0637 31 11 69897.9266 0 2330 0 0
## X858032 7333.3380 40 12 39060.6061 1000 0 0 0
## X15240 11386.7530 30 17 125405.1037 2800 800 100000 15000
## X1052033 4336.5281 25 14 26725.6778 800 1500 0 0
## X8230 2913.2891 66 16 237447.3684 105000 0 0 375000
## X6565 6185.1676 45 17 51395.5343 590 5500 22000 0
## X8210 8582.8214 69 8 21586.1244 0 4000 0 0
## X16370 7369.8167 55 12 12334.9282 2000 0 0 0
## X2495 5634.0684 68 13 46255.9809 4500 15000 0 0
## X4950 9383.2642 78 12 21586.1244 3000 700 0 0
## X20625 3005.5195 67 17 169605.2632 5000 0 0 0
## X12640 6976.6009 38 17 85316.5869 3500 5800 0 0
## X16455 9143.1550 48 16 102791.0686 2000 35000 0 30000
## X20670 7144.1292 36 13 25697.7671 0 0 0 0
## X9855 5315.0495 28 11 2569.7767 0 0 0 0
## X7590 6391.5128 50 12 29809.4099 0 0 0 0
## X10390 8251.3463 82 1 6887.0016 160 0 0 0
## X6885 4867.2383 54 17 239503.1898 12700 0 0 0
## X12630 4832.6968 87 10 35976.8740 1000 95000 190000 0
## X587534 7786.5650 80 10 8326.0766 6000 1000 0 0
## X6415 5711.2392 45 16 97651.5152 1050 10000 0 0
## X13800 5809.2268 72 14 49339.7129 4500 0 184000 0
## X21210 5990.8378 39 12 64758.3732 1000 2500 0 0
## X20775 5111.3136 19 14 6887.0016 700 0 0 0
## X16165 6421.3632 33 16 65786.2839 3500 0 0 1200
## X249535 5634.0684 68 13 46255.9809 4500 15000 0 0
## X18530 5463.0804 57 17 76065.3908 2500 36000 0 0
## X1182036 5689.9210 38 16 69897.9266 2501 1200 48000 0
## X2485 2708.6805 49 17 61674.6411 400 3000 0 65000
## X16785 10609.3570 74 12 15418.6603 1900 0 0 0
## X11750 6777.0313 67 17 104846.8899 10000 22000 3000 0
## X3025 6021.0394 36 14 41116.4274 800 0 0 3500
## X3470 6106.0776 50 12 59618.8198 3000 13900 0 0
## X1860 7640.0210 43 14 97651.5152 4200 1000 0 0
## X3920 5952.7513 28 15 10279.1069 800 0 0 0
## X19430 8253.7779 46 12 19530.3030 500 0 0 0
## X16535 9276.8733 39 13 26725.6778 200 0 0 200
## X13620 7102.5441 50 13 32893.1419 1500 5000 0 0
## X17880 7448.2342 50 14 67842.1053 1700 0 0 0
## X4875 8104.8327 47 15 82232.8549 2000 0 0 50000
## X19300 8951.3784 49 12 7195.3748 0 1500 0 0
## X7075 4214.2722 27 17 32893.1419 400 0 0 0
## X15130 1258.0767 58 17 145963.3174 10000 100000 1350000 500000
## X13555 7031.0802 53 17 204554.2265 35000 0 0 0
## X8385 5165.9872 42 12 98679.4258 200 200 0 0
## X831537 6657.7220 69 16 51395.5343 1000 115000 0 10000
## X1330 6038.9240 21 12 55507.1770 200 550 0 2300
## X6710 6046.9947 44 14 77093.3014 2000 2000 0 0
## X6055 3727.8709 55 16 236419.4577 0 0 0 0
## X20455 2617.2982 46 12 263145.1356 8000 11000 405000 60000
## X2025 1782.7152 44 16 211749.6013 7000 1500 200000 70000
## X8485 8332.8960 69 12 24669.8565 0 0 0 0
## X6475 6829.6252 30 14 42144.3381 1000 0 0 0
## X4305 3570.0930 59 14 74009.5694 14000 0 296000 0
## X6900 9220.5450 71 16 51395.5343 15000 14000 95000 153000
## X14525 3787.5754 39 17 290898.7241 9000 900 0 300000
## X3070 5948.8229 51 16 34948.9633 1000 2620 0 0
## X811538 8155.4622 78 17 98679.4258 7000 0 140000 0
## X1420 5491.3648 58 16 58590.9091 7000 10000 0 0
## X542039 7143.7855 43 15 58590.9091 50 1960 1500 0
## X18380 3800.5231 81 9 101763.1579 36100 1700 135000 100000
## X4185 9295.7487 68 16 34948.9633 630 0 0 0
## X13830 6275.9318 41 14 5139.5534 0 20 10000 0
## X6590 8483.7784 51 12 88400.3190 480 3050 0 0
## X13340 4841.2957 41 8 32893.1419 1500 0 0 0
## X5625 673.0648 35 17 170633.1738 6000 0 7000 0
## X9625 8567.4103 58 14 93539.8724 4100 6000 0 150000
## X12020 7758.8189 75 9 17474.4817 3000 0 0 0
## X9580 3597.7161 53 16 170633.1738 10000 30 500000 0
## X277040 7022.5454 62 16 47283.8915 2700 2000 0 0
## X50 4662.1601 32 14 79149.1228 0 2000 0 0
## X369541 10109.2136 49 17 76065.3908 2000 500 0 0
## X7540 5516.1131 45 13 22614.0351 300 0 0 0
## X1030 10556.2567 77 12 29809.4099 1000 1000 0 0
## X14400 8454.0719 72 10 4008.8517 0 0 0 0
## X7415 3639.7242 41 14 29809.4099 0 0 0 2500
## X3990 6367.9364 65 5 12334.9282 0 0 0 0
## X3245 6936.3230 63 12 53451.3557 3300 0 0 60000
## X2575 5576.2209 67 17 65786.2839 3000 200 4000 0
## X9105 7529.2122 60 12 20558.2137 1500 0 0 0
## X7985 5978.3895 26 14 89428.2297 2000 0 0 0
## X1300 8554.5639 79 14 31865.2313 810 5100 0 17000
## X4760 4867.2383 39 17 78121.2121 400 0 0 116000
## X16305 6168.6436 22 12 33921.0526 60 0 0 0
## X21035 6648.1001 48 14 45228.0702 0 7500 30000 0
## X2905 5929.3158 75 13 9251.1962 3000 30000 0 0
## X1610 8138.5659 72 12 25697.7671 11000 8500 0 0
## X3490 5496.9282 56 12 15418.6603 4000 500 0 0
## X16585 9117.2509 21 12 28781.4992 50 0 0 0
## X4145 9942.5215 66 10 25697.7671 0 0 0 0
## X3135 7973.0718 35 12 61674.6411 0 0 0 0
## X6000 5336.4730 25 16 20558.2137 1500 3000 20000 0
## X10420 5478.6751 39 12 51395.5343 0 0 0 0
## X1655 5586.1606 58 14 34948.9633 0 0 0 0
## X10705 7409.4581 48 7 12334.9282 4000 0 0 0
## X11735 6187.6895 49 16 122321.3716 20000 0 71000 0
## X6720 5215.7507 22 12 18502.3923 1200 500 0 0
## X12680 8925.5408 27 12 37004.7847 2150 500 0 1000
## X7530 10000.0270 66 12 25697.7671 1200 6000 0 0
## X7795 6111.3967 41 13 40088.5167 120 1 0 0
## X1480 5140.6393 49 16 29809.4099 1800 21600 0 0
## X21575 6161.6719 43 2 68870.0159 1740 10500 0 0
## X2585 9636.8011 58 10 83260.7655 8000 0 0 0
## X16595 7065.6086 52 6 5550.7177 0 0 0 0
## X4040 7888.3857 68 7 14390.7496 100 0 0 0
## X1630542 6168.6436 22 12 33921.0526 60 0 0 0
## X16330 6551.0792 25 16 71953.7480 5000 2740 0 0
## X15665 8384.9108 36 17 12334.9282 2500 26000 0 21000
## X690043 9220.5450 71 16 51395.5343 15000 14000 95000 153000
## X6795 8112.0825 43 16 121293.4609 300 100 0 0
## X21350 9864.7351 68 12 24669.8565 1200 200 0 0
## X6700 1503.1836 66 16 289870.8134 0 0 275000 88000
## X18665 6813.5822 45 14 53451.3557 0 5000 0 0
## X19580 8548.9432 29 14 98679.4258 32800 20000 0 20000
## X20130 8003.3654 32 14 45228.0702 500 1700 10000 2500
## X10325 4593.4824 25 15 23641.9458 2300 3420 0 0
## X4130 3284.9326 35 16 122321.3716 6000 5000 110000 0
## X5475 6092.8720 57 3 6373.0463 3290 9200 0 0
## X1790 7500.1937 66 12 67842.1053 1300 0 0 0
## X17480 8048.4219 41 16 60646.7305 1000 3850 0 0
## X12830 5932.7371 47 12 55507.1770 1100 41000 0 0
## X1865 6282.7566 51 9 62702.5518 3500 0 0 0
## X768044 7263.7921 52 12 35976.8740 1700 3000 2000 0
## X457045 6836.3888 53 13 10279.1069 50 0 0 0
## X11490 8507.8199 88 12 5756.2998 2620 12400 0 5000
## X1146546 5134.4672 54 12 68870.0159 400 900 0 0
## X10180 5948.7354 23 16 20558.2137 2000 100 0 1000
## X3910 5973.1357 35 15 44200.1595 3000 55000 550000 0
## X11565 11386.7530 49 16 100735.2472 2500 450 0 4000
## X21825 6245.2759 24 14 30837.3206 1500 0 0 0
## X4525 4297.7367 42 16 17474.4817 1000 0 0 0
## X3060 8414.8911 76 12 14390.7496 900 0 0 0
## X9250 7609.5092 61 10 25697.7671 1000 1100 0 0
## X17500 5861.6929 56 12 53451.3557 21000 20800 0 0
## X1100 5302.7948 40 12 35976.8740 0 0 0 0
## X16025 7138.9429 42 12 56535.0877 1200 1251 0 0
## X12380 5632.2290 40 16 66814.1946 1000 1500 0 0
## X753547 5532.8460 23 14 25697.7671 1200 1 0 0
## X12850 4689.7790 33 16 61674.6411 3500 5600 0 0
## X1955548 8909.1588 47 14 25697.7671 17320 730 0 0
## X19775 6366.6587 81 12 9353.9872 750 2000 0 0
## X11525 4683.3579 52 10 8017.7033 390 0 0 0
## X2975 9116.0082 72 12 24669.8565 100 500 0 0
## X18895 10098.3165 23 12 18502.3923 2600 0 0 0
## X1602549 7138.9429 42 12 56535.0877 1200 1251 0 0
## X345 6272.7234 44 9 12334.9282 9200 0 0 0
## X490 6976.6009 44 16 116153.9075 3000 0 0 0
## X14580 1435.8448 50 16 488257.5758 10000 0 10000000 8000000
## X10875 7706.5913 41 17 63730.4625 2000 11500 0 0
## X5270 5113.1426 32 16 21586.1244 40 0 0 0
## X9400 8294.9122 57 14 24669.8565 0 230 0 0
## X12900 8693.4985 65 15 27753.5885 7200 3200 0 0
## X4530 8281.4292 79 12 14390.7496 0 35000 0 0
## X17670 6587.6319 42 8 30837.3206 0 0 0 0
## X5440 7096.7500 34 14 54479.2663 300 30 0 0
## X8875 4908.0300 36 12 62702.5518 3000 1030 0 0
## X2060 10858.1982 85 11 22614.0351 340 3500 0 0
## X2153550 8618.8325 66 12 9765.1515 550 5600 0 0
## X5080 6983.5455 49 12 51395.5343 1500 0 0 0
## X12500 7868.5326 45 16 68870.0159 1000 10000 300 22000
## X830 5696.6397 28 14 8634.4498 100 0 0 0
## X495051 9383.2642 78 12 21586.1244 3000 700 0 0
## X1304052 5263.6488 40 13 66814.1946 0 380 0 0
## X16685 6956.0287 24 12 30837.3206 300 200 0 0
## X5695 6842.0098 82 16 17474.4817 1900 0 0 0
## X2026553 4733.4575 40 16 28781.4992 820 400 0 18000
## X215 6221.1912 53 17 74009.5694 11000 0 0 25000
## X7460 9759.8671 45 12 43172.2488 3200 600 5000 0
## X21060 7675.9421 92 16 9251.1962 2500 18000 0 0
## X3770 6271.7559 61 12 56535.0877 20000 7000 35000 0
## X940054 8294.9122 57 14 24669.8565 0 230 0 0
## X15320 7356.9559 61 14 11307.0175 600 0 0 0
## X96555 5837.2792 83 12 14390.7496 600 20000 0 0
## X19340 9125.7408 78 8 18502.3923 0 0 0 0
## X1395 6579.0643 47 16 35976.8740 1300 500 0 0
## X939056 7393.9024 65 16 14390.7496 1000 3000 2000 2000
## X5245 8229.2596 61 6 11307.0175 430 0 0 0
## X18830 9253.5661 45 14 22614.0351 450 600 0 0
## X15215 6956.5608 39 16 66814.1946 2000 500 15000 0
## X496557 5025.3417 49 10 0.0000 0 0 0 0
## X12210 7636.2540 35 12 24669.8565 0 22000 0 0
## X17560 9704.9669 66 12 66814.1946 2500 5000 0 0
## X19625 8823.5948 63 1 17474.4817 4000 14700 0 0
## X2530 7480.5053 41 12 20558.2137 1000 0 46000 500
## X9075 6474.0924 41 12 66814.1946 1000 1000 0 0
## X1925 8462.7026 88 11 11307.0175 0 9900 0 0
## X21010 7610.4252 28 16 59618.8198 0 23000 50000 0
## X1745058 7513.2426 57 12 11307.0175 0 0 0 0
## X17555 2695.1985 39 13 42144.3381 700 0 0 0
## X2018059 8410.7240 61 15 101763.1579 450 0 0 0
## X5330 2017.1634 77 16 117181.8182 7000 0 0 100000
## X2150560 5686.0635 43 12 29809.4099 0 0 0 0
## X2970 4411.1662 56 12 13362.8389 0 2100 0 0
## X19190 9186.7808 35 17 68870.0159 200 1700 0 0
## X12570 8712.1533 39 17 23641.9458 1200 0 0 0
## X1325 6437.8045 66 12 32893.1419 900 0 200 0
## X4195 6522.8860 33 14 81204.9442 5000 3000 0 0
## X20915 2494.8521 45 12 61674.6411 3000 0 0 0
## X14145 6934.1103 46 12 77093.3014 87800 8400 19600 1000
## X13090 5132.5279 52 16 33921.0526 300 0 0 0
## X2211061 8414.4992 38 12 56535.0877 1500 1750 0 0
## X13062 6832.8261 50 17 64758.3732 1100 14700 0 0
## X7190 5867.3346 59 12 4522.8070 0 10 0 0
## X10690 8483.6259 79 9 12334.9282 200 0 0 0
## X21495 5951.9803 29 11 25697.7671 2000 0 0 0
## X3745 5958.4027 47 13 87372.4083 1000 370 0 0
## X2315 6840.0838 22 12 18502.3923 0 10 0 0
## X3170 6539.6647 51 3 10279.1069 160 0 0 0
## X10940 9740.7156 76 14 46255.9809 3200 1700 103000 20000
## X2116563 5386.4622 40 14 71953.7480 0 110000 135000 0
## X2109564 6777.8412 51 14 76065.3908 600 0 0 0
## X233065 6659.7322 46 17 20558.2137 1000 400 6000 40000
## X17530 5609.5420 38 16 31865.2313 1000 0 34000 0
## X12410 1832.4461 66 16 430694.5773 4500 0 1220000 0
## X1694566 6440.1730 79 12 39060.6061 27000 15000 0 0
## X1250 7269.0062 41 12 52423.4450 4270 6010 0 0
## X1323567 5454.0787 29 14 9251.1962 2000 20000 0 0
## X17190 11386.7530 42 12 207637.9585 2100 1100 6200 0
## X103068 10556.2567 77 12 29809.4099 1000 1000 0 0
## X1018069 5948.7354 23 16 20558.2137 2000 100 0 1000
## X18295 10394.6521 53 11 16446.5710 100 200 0 0
## X8770 10597.7984 76 16 18502.3923 11300 0 0 0
## X585 8414.4992 38 12 82232.8549 2100 3000 0 0
## X8750 3316.2325 35 17 105874.8006 0 0 0 0
## X13955 5457.0780 63 9 16446.5710 100 7800 0 0
## X18825 216.1459 60 16 256977.6714 1000 0 500000 5000000
## X12280 7685.7706 42 16 75037.4801 0 0 0 0
## X21780 5711.2392 53 14 56535.0877 1000 0 0 0
## X17810 6746.5369 62 17 105874.8006 4000 14000 0 15000
## X2535 5622.6830 62 16 64758.3732 1500 12000 0 0
## X1614070 6677.0208 68 14 44200.1595 22000 50000 0 0
## X17290 6934.7265 60 14 22614.0351 1500 3000 0 0
## X15400 1832.4461 59 12 325847.6874 1500 56000 17000 18000
## X1842071 7029.1679 31 13 80177.0335 1550 0 0 3000
## X1650 7512.8658 52 13 80177.0335 1000 0 0 0
## X2050072 8349.2691 26 10 19530.3030 0 0 0 0
## X360 9617.6686 29 11 17474.4817 0 100 0 0
## X1895 9047.5141 41 17 118209.7289 2550 2200 35000 95000
## X7350 5915.7400 40 12 89428.2297 20 0 0 0
## X107573 7485.5250 77 12 12334.9282 1700 12390 43000 0
## X2204574 3394.2432 68 12 80177.0335 2000 15000 0 25000
## X10345 6435.6192 41 9 18502.3923 20 0 0 0
## X90575 7456.1503 23 11 12334.9282 700 0 0 0
## X75 5826.0300 41 16 135684.2105 3000 2000 0 3000
## X1550 8535.9389 32 14 72981.6587 1050 310 0 0
## X6040 6911.1151 50 12 27753.5885 500 0 0 0
## X3525 11291.1815 75 17 104846.8899 5000 0 0 0
## X6980 6885.7776 65 12 69897.9266 590 8450 0 900
## X178576 9471.2816 63 12 21586.1244 2300 0 0 0
## X16935 6093.3713 22 12 24669.8565 8000 0 0 0
## X928577 3597.7161 67 16 129516.7464 4000 70000 50000 300000
## X9615 6226.2105 42 15 19530.3030 0 0 0 0
## X17940 7647.2525 21 12 33921.0526 280 500 0 0
## X2855 5717.5206 66 12 9148.4051 0 0 0 0
## X17900 8173.5225 56 6 27753.5885 590 0 0 0
## X1222078 2814.9047 36 16 855221.6906 60000 230000 350000 0
## X2020 11188.2330 75 10 12334.9282 1050 4000 0 0
## X10380 6301.9789 24 15 3083.7321 0 0 0 0
## X4000 9404.8200 56 12 32893.1419 2000 0 19200 0
## X90 7762.3371 63 12 49339.7129 500 3000 0 400
## X19145 6279.5012 46 17 88400.3190 1000 1500 69600 0
## X9140 4396.2365 59 15 0.0000 1600 0 0 0
## X2300 5313.0456 29 11 15418.6603 30 5 0 0
## X13560 4701.5738 50 12 18502.3923 600 130 0 0
## X1767079 6587.6319 42 8 30837.3206 0 0 0 0
## X9665 8183.9394 41 17 141851.6746 230 590 200 14000
## X9065 6428.1923 62 14 28781.4992 500 4500 0 0
## X12715 5000.2165 52 12 87372.4083 380 0 0 0
## X1069080 8483.6259 79 9 12334.9282 200 0 0 0
## X15150 5438.2853 42 15 41116.4274 0 48000 0 0
## X14780 3597.7161 71 16 121293.4609 27000 12000 900000 1000000
## X3080 5431.6661 46 16 131572.5678 1410 3340 0 0
## X1023081 7426.1415 49 14 30837.3206 0 10000 0 0
## X9725 8641.0258 42 12 107930.6220 2500 3700 0 0
## X1330082 5711.5575 59 12 16446.5710 100 70 0 0
## X3215 11386.7530 41 16 136712.1212 4000 103000 18000 70000
## X1069083 8483.6259 79 9 12334.9282 200 0 0 0
## X19635 4799.3712 49 9 10279.1069 0 0 0 0
## X14800 6505.3275 35 12 82232.8549 3000 0 0 0
## X2105584 8250.0749 67 12 12334.9282 10220 0 0 0
## X21380 4683.3579 51 6 11307.0175 0 0 0 0
## X2024585 6984.7124 25 12 56535.0877 2000 0 0 0
## X11040 6972.1301 33 16 34948.9633 300 1050 0 800
## X12070 8012.1578 73 8 18502.3923 920 0 0 0
## X3465 9199.2632 30 16 141851.6746 6000 510 13000 23000
## X20725 9942.5215 90 10 6167.4641 0 0 0 0
## X15730 8244.9929 29 12 113070.1754 100 30000 0 0
## X17005 1503.1836 73 17 154186.6029 2500 3000 315000 18000
## X4065 6197.2818 59 13 34948.9633 2200 0 0 0
## X620 6750.9058 56 12 18502.3923 300 0 0 0
## X1776586 6235.1707 83 15 7812.1212 2000 0 0 0
## X2715 8028.6114 32 12 57562.9984 0 0 0 1000
## X5210 9050.9752 48 14 46255.9809 800 1000 5000 0
## X1303587 8220.9304 83 14 81204.9442 7000 193000 0 0
## X18625 6316.2726 43 14 49339.7129 700 3310 30000 0
## X20460 4841.8475 25 12 28781.4992 0 300 0 0
## X4700 6088.9010 49 16 30837.3206 2000 0 0 0
## X256588 6641.8552 42 16 88400.3190 2000 44000 100000 20000
## X4180 146.7205 49 16 678421.0526 0 10000 1000000 0
## X5760 7480.3327 51 16 178856.4593 3750 0 0 0
## X286589 10911.3427 65 11 26725.6778 7000 6000 7500 22000
## X5745 5771.2818 34 12 18502.3923 2700 0 0 0
## X5175 5026.4489 36 11 19530.3030 0 300 0 0
## X15105 6017.3025 48 8 26725.6778 10 4000 0 0
## X19895 5145.1112 58 12 5139.5534 310 0 0 0
## X1210 6281.8256 51 16 140823.7640 11200 28000 0 0
## X1998090 7630.0979 86 17 37004.7847 10000 20000 0 0
## X202091 11188.2330 75 10 12334.9282 1050 4000 0 0
## X14570 6568.5137 33 9 42144.3381 1000 1700 9000 0
## X2100 9341.9182 47 12 35976.8740 480 0 0 0
## X1208092 5664.1469 20 14 21586.1244 700 0 0 0
## X21340 6229.3682 22 14 22614.0351 1000 2000 0 25000
## X14250 6519.0232 30 7 24669.8565 0 0 0 0
## X1719093 11386.7530 42 12 207637.9585 2100 1100 6200 0
## X2105594 8250.0749 67 12 12334.9282 10220 0 0 0
## X11425 6084.3695 54 17 160354.0670 7230 1500 39000 20000
## X2135095 9864.7351 68 12 24669.8565 1200 200 0 0
## X13565 6017.3025 50 15 35976.8740 50 0 0 0
## X17540 3468.5962 42 17 164465.7097 0 8100 84000 100000
## X2985 8494.8290 77 7 15418.6603 0 0 0 0
## X15070 4397.4675 30 16 15418.6603 2200 500 4800 0
## X3505 9857.4584 70 13 19530.3030 1800 11000 0 0
## X15015 9887.6811 55 14 66814.1946 1530 0 0 0
## X16815 4348.7029 32 12 13362.8389 500 0 0 0
## X7485 6228.4329 45 16 41116.4274 0 0 0 0
## X18460 5665.4249 64 16 65786.2839 0 0 65000 0
## X9465 5717.0688 42 13 54479.2663 0 0 0 0
## X10825 6128.8931 32 9 15418.6603 50 2500 0 0
## X8105 7108.2801 34 16 71953.7480 1150 11500 0 170000
## X5820 6043.2868 54 13 38032.6954 1000 22000 0 0
## X14765 11386.7530 71 12 38032.6954 4500 3000 0 160000
## X5340 2818.4551 54 17 175772.7273 6000 35010 140000 7000
## X3720 4804.4293 34 9 22614.0351 0 0 0 0
## X4475 6685.2809 59 17 25697.7671 350 0 35000 0
## X15185 9717.4090 36 12 30837.3206 520 1900 6000 0
## X68096 6833.6584 48 16 100735.2472 3800 0 13000 0
## X13745 7513.1999 39 11 11307.0175 0 0 0 0
## X125097 7269.0062 41 12 52423.4450 4270 6010 0 0
## X8345 6991.5808 33 11 53451.3557 400 1200 0 0
## X2435 6752.1781 57 13 17474.4817 280 5 0 0
## X1900 3591.7791 77 17 38032.6954 6000 15000 175000 250000
## X11670 5731.3659 58 16 18502.3923 5700 2500 24000 0
## X18465 8998.8304 61 12 47283.8915 5200 4000 0 0
## X20605 9663.4260 39 12 27753.5885 1000 1500 0 0
## X1127598 10483.6685 68 12 27753.5885 3300 8000 32000 116000
## X15815 7402.3350 41 12 49339.7129 1500 100 0 0
## X4465 10051.0999 69 15 61674.6411 4500 0 28000 0
## X14585 6609.2130 49 2 134656.2998 500 0 30000 1000
## X12930 6801.3996 40 14 43172.2488 2500 1000 20000 0
## X3875 6076.2830 33 14 41116.4274 1000 6000 0 0
## X11340 5351.9437 79 14 41116.4274 2300 15000 0 50000
## X4985 7666.4941 56 12 113070.1754 880 0 0 50000
## X21245 7308.9786 33 12 29809.4099 1 0 0 0
## X1203599 5803.8741 35 12 46255.9809 770 0 0 0
## X18640 9424.8965 70 16 21586.1244 150 3000 0 0
## X3875100 6076.2830 33 14 41116.4274 1000 6000 0 0
## X7570 7388.8300 37 12 48311.8022 2500 150 0 0
## X16115 7197.0427 46 16 102791.0686 0 5000 0 200000
## X6355 6830.9965 46 12 29809.4099 0 0 0 0
## X17615 7230.1995 48 12 30837.3206 30 400 0 0
## X6920 4859.1365 80 3 5550.7177 0 0 0 0
## X8960 4327.1130 32 16 83260.7655 3540 330 0 0
## X2325101 7803.2247 45 15 113070.1754 4900 2000 12000 1000
## X255 8959.1771 57 14 77093.3014 2000 10000 0 0
## X19515 4547.3081 28 16 82232.8549 1110 0 20000 25000
## X19205102 7691.5051 44 12 53451.3557 1000 1050 0 0
## X15825 6525.3098 38 16 58590.9091 1000 0 0 0
## X9090 5431.0831 47 14 39060.6061 500 580 0 0
## X4540 10335.8356 83 8 7709.3301 700 8000 0 0
## X15225 1483.1859 46 16 575629.9840 13460 59970 330000 600000
## X10300 4752.0953 24 12 7400.9569 0 120 0 0
## X21650 2980.8019 36 14 41116.4274 2000 200 0 0
## X3780 4750.7510 55 12 10279.1069 10 0 0 0
## X13835 7147.1438 54 15 57562.9984 2500 0 0 0
## X5100 7138.9429 41 13 51395.5343 2000 0 0 0
## X11025 5317.0913 43 2 65786.2839 3000 8700 0 0
## X13740 5573.7811 36 12 32893.1419 2700 6000 0 0
## X17905 8897.0175 33 16 32893.1419 900 60 0 0
## X925 4893.8646 54 13 41116.4274 2000 500 0 0
## X11925 9050.7760 42 17 12334.9282 510 0 0 0
## X5210103 9050.9752 48 14 46255.9809 800 1000 5000 0
## X14005 4375.7084 24 12 11307.0175 0 0 0 0
## X17815 6270.3754 56 16 102.7911 1000 0 0 0
## X14200 6430.0038 43 12 57562.9984 610 2890 0 0
## X2855104 5717.5206 66 12 9148.4051 0 0 0 0
## X310105 5950.2488 40 12 21586.1244 2000 500 0 0
## X15355 10936.6997 71 17 26725.6778 4000 0 2000 0
## X15135 7500.0875 30 13 43172.2488 6500 3800 0 0
## X9020 8109.0436 30 16 83260.7655 11000 0 0 0
## X18630 9692.4653 42 10 82232.8549 2520 800 0 0
## X17315 9117.2509 23 12 15418.6603 0 105 0 0
## X19685 5755.9082 49 12 53451.3557 1000 300 0 0
## X7100 5700.7692 42 12 30837.3206 400 220 0 0
## X12945106 6490.4551 27 12 28781.4992 0 0 4000 0
## X7655 1537.8345 39 16 411164.2743 18000 0 50000 50000
## X20875 4540.8913 31 12 6681.4195 0 1650 0 0
## X9300 7690.7114 32 13 52423.4450 210 700 0 0
## X16905107 11386.7530 76 16 28781.4992 5600 6800 48000 100000
## X155 6972.1301 31 12 46255.9809 4300 300 0 0
## X15280 4540.8913 32 13 14390.7496 0 0 0 0
## X17385 5748.4001 85 14 9045.6140 1200 0 0 0
## X12880 9610.2593 29 16 69897.9266 2400 250 0 0
## X1595 8089.9766 43 17 175772.7273 2990 1800 103000 2500
## X5720 7359.6699 30 12 51395.5343 200 3000 0 0
## X17375 5950.0337 58 14 100735.2472 500 18000 0 0
## X11495108 9240.9040 44 16 81204.9442 5700 2200 0 8800
## X7680109 7263.7921 52 12 35976.8740 1700 3000 2000 0
## X2590 6645.0764 48 15 28781.4992 1820 850 0 0
## X7200 9629.4836 72 12 10279.1069 1000 0 0 0
## X1575 6805.1027 38 10 61674.6411 850 0 0 0
## X12065 4620.4798 37 12 6887.0016 0 0 0 0
## X9715 8414.4992 39 17 123349.2823 2500 2000 0 0
## X5065 5298.4252 62 10 11307.0175 0 0 0 0
## X9520 4900.7021 39 13 30837.3206 1000 2300 0 0
## X20565110 11386.7530 40 17 193247.2089 5000 1000 0 23000
## X2100111 9341.9182 47 12 35976.8740 480 0 0 0
## X5175112 5026.4489 36 11 19530.3030 0 300 0 0
## X15880 10431.8465 53 12 40088.5167 1300 4500 0 0
## X615 6944.9344 22 12 18502.3923 1000 0 0 0
## X19490 6198.3243 47 17 144935.4067 2500 8000 0 7500
## X13850 5611.6356 32 13 38032.6954 400 0 0 0
## X14070 7228.2467 74 12 33921.0526 520 3900 0 0
## X16555113 7602.8631 71 12 16446.5710 1000 0 0 0
## X21750 6950.8753 52 17 39060.6061 300 1200 0 0
## X1305 7662.7554 35 13 123349.2823 1600 15500 0 35000
## X5210114 9050.9752 48 14 46255.9809 800 1000 5000 0
## X14400115 8454.0719 72 10 4008.8517 0 0 0 0
## X4120 3435.4000 83 16 30837.3206 10000 0 0 0
## X13600 9383.0100 32 14 92511.9617 0 1300 0 55000
## X1670 11259.5753 66 12 18502.3923 800 0 0 0
## X8790 7029.6621 44 16 100735.2472 400 1900 0 1300
## X8150 6310.8060 47 12 53451.3557 750 5000 0 0
## X14155 6695.2174 73 4 33921.0526 11500 0 0 0
## X17905116 8897.0175 33 16 32893.1419 900 60 0 0
## X16735 5804.3179 66 16 51395.5343 1000 0 0 0
## X21095117 6777.8412 51 14 76065.3908 600 0 0 0
## X10280 9401.2080 80 16 42144.3381 9200 0 0 0
## X8695 6033.5784 22 12 29809.4099 0 0 0 0
## X15485 5972.4428 50 14 4933.9713 0 0 0 0
## X920118 5812.9110 44 15 87372.4083 300 32700 0 0
## X525 8516.9613 73 12 9559.5694 1300 0 0 0
## X10740119 9507.0043 78 5 7709.3301 0 0 0 0
## X8885 5016.1313 47 12 41116.4274 240 250 0 0
## X20200 5845.6749 26 14 52423.4450 1000 300 0 0
## X2295 7105.8186 69 6 17474.4817 720 0 0 0
## X14855 4017.0656 34 12 27753.5885 800 0 0 0
## X20390 10431.8465 50 13 48311.8022 3000 0 0 0
## X13895 5545.0508 35 12 37004.7847 1020 0 770 0
## X12335 8147.8136 76 12 30837.3206 18000 11000 0 0
## X11880 5934.1263 54 12 63730.4625 0 0 0 0
## X3750 1541.4366 65 15 65786.2839 3000 2000 0 2500
## X16305120 6168.6436 22 12 33921.0526 60 0 0 0
## X11875 6985.4172 42 9 45228.0702 0 100 0 0
## X7670121 7111.7751 50 14 53451.3557 3100 0 0 0
## X6130 7499.2157 41 12 11307.0175 0 0 0 0
## X8050 7657.5067 56 7 47283.8915 150 0 0 0
## X17970 8302.0167 72 16 15418.6603 0 0 0 0
## X18735 4949.8349 49 16 10279.1069 0 50 0 0
## X6920122 4859.1365 80 3 5550.7177 0 0 0 0
## X235 4923.1401 48 12 42144.3381 0 0 0 0
## X5530 6282.3042 52 17 118209.7289 0 0 10000 10000
## X18720 5887.3046 49 12 2055.8214 0 400 0 0
## X2330123 6659.7322 46 17 20558.2137 1000 400 6000 40000
## X335 4177.0529 41 12 25697.7671 0 1500 0 0
## X19405 5895.3995 29 12 69897.9266 2830 910 0 0
## X2615 8241.5555 46 12 60646.7305 2400 1500 0 0
## X8060 6654.5552 67 12 61674.6411 2200 15000 0 100000
## X7990 4922.4516 27 14 24669.8565 630 0 0 0
## X17205 8028.6114 34 5 37004.7847 1400 200 0 0
## X110 10373.1531 79 12 16446.5710 800 0 0 0
## X16470124 3597.7161 43 12 1408237.6396 10 0 0 0
## X8690 7624.4548 48 12 82232.8549 4000 7000 0 0
## X3350 11386.7530 58 13 30837.3206 11000 159900 0 2500
## X2440 5845.6749 30 15 46255.9809 1020 0 0 5700
## X9570 6262.5384 20 14 24669.8565 100 0 0 0
## X19650 7270.8124 48 13 29809.4099 650 7000 40000 25000
## X12680125 8925.5408 27 12 37004.7847 2150 500 0 1000
## X6175 7664.6497 52 11 68870.0159 1800 0 0 1200
## X2860126 6604.7905 45 12 20558.2137 1500 530 0 0
## X21470 9663.4260 42 12 25697.7671 600 0 0 0
## X9360 5700.4279 22 13 82232.8549 300 5500 0 0
## X3235127 6509.0382 57 12 51395.5343 700 2200 0 0
## X10540 3529.8025 52 17 102791.0686 11000 0 464000 120000
## X18595 1503.9178 49 17 116153.9075 5000 66500 115000 19000
## X13935 5057.4235 86 17 72981.6587 24000 0 60000 35000
## X16950 5315.7504 52 16 114098.0861 10000 10000 0 0
## X9715128 8414.4992 39 17 123349.2823 2500 2000 0 0
## X11980 8233.0605 20 11 20558.2137 200 0 0 0
## X2345 6804.0267 40 17 53451.3557 4500 520 70000 0
## X21130129 11097.5342 78 12 15418.6603 310 0 0 0
## X12600 6482.1927 53 12 35976.8740 600 2500 0 0
## X3470130 6106.0776 50 12 59618.8198 3000 13900 0 0
## X9425 5837.2792 89 14 13362.8389 800 10000 0 0
## X21625 6119.4284 27 12 32893.1419 1000 0 0 0
## X13110 6654.0201 68 16 15418.6603 160 16000 0 0
## X10765 8414.4992 37 12 48311.8022 1000 770 0 2000
## X10290 6556.6396 37 14 95595.6938 1500 5600 80000 0
## X20650 11386.7530 48 16 31865.2313 5000 200 40000 0
## X20680 5009.8727 42 12 53451.3557 2000 0 84000 0
## X20325 7403.3590 38 13 70925.8373 2500 38800 0 0
## X15740 5752.8640 54 6 15418.6603 0 0 0 0
## X10040131 7735.4996 88 13 11307.0175 0 0 0 0
## X2085 7857.7937 38 10 10176.3158 780 0 0 0
## X18375132 3907.1193 26 12 29809.4099 1000 2500 0 0
## X15970 3029.4357 58 14 50367.6236 3000 0 180000 29000
## X15490 6835.9388 43 16 56535.0877 1200 24150 3500 0
## X9805 7615.4140 47 14 14390.7496 200 0 0 0
## X19805 5974.4062 32 13 62702.5518 70 0 0 0
## X6710133 6046.9947 44 14 77093.3014 2000 2000 0 0
## X20265134 4733.4575 40 16 28781.4992 820 400 0 18000
## X16850 4798.3702 27 7 32893.1419 20 0 0 0
## X10875135 7706.5913 41 17 63730.4625 2000 11500 0 0
## X10560136 7091.0393 78 6 8223.2855 660 0 0 0
## X19625137 8823.5948 63 1 17474.4817 4000 14700 0 0
## X13545 6930.3176 64 10 37004.7847 300 3000 0 0
## X13725138 7519.7320 55 11 27753.5885 0 700 0 4000
## X13385 6683.4423 40 12 81204.9442 2000 0 1000 0
## X16935139 6093.3713 22 12 24669.8565 8000 0 0 0
## X4930 11139.3320 79 7 21586.1244 8000 15000 0 0
## X20930 7581.9742 44 12 41116.4274 530 800 0 500
## X17100 5971.9347 57 14 28781.4992 8600 13000 0 5000
## X18795 8287.0909 78 11 19530.3030 3000 14000 0 0
## X1315 4712.3192 42 12 29809.4099 0 1500 0 0
## X3990140 6367.9364 65 5 12334.9282 0 0 0 0
## X6590141 8483.7784 51 12 88400.3190 480 3050 0 0
## X10940142 9740.7156 76 14 46255.9809 3200 1700 103000 20000
## X17560143 9704.9669 66 12 66814.1946 2500 5000 0 0
## X300 3764.5960 41 12 31865.2313 60 200 0 0
## X11475 7178.3649 23 16 74009.5694 810 0 3000 0
## X15370 5071.1464 52 12 44200.1595 600 340 0 0
## X12230 5301.4026 31 10 15418.6603 130 0 0 0
## X6570 7914.2643 58 13 21586.1244 200 200 0 0
## X13610 7169.1759 36 11 51395.5343 80 10 0 0
## X1940 7388.8300 37 17 59618.8198 4500 6000 0 0
## FIN VEHIC HOMEEQ OTHNFIN DEBT NETWORTH
## X17470 39600 6400 84000 0 40200.0 170800
## X315 5400 21000 8000 0 58640.0 17760
## X8795 15460 2000 12000 0 19610.0 9850
## X10720 54700 18250 90000 0 8000.0 284950
## X19170 12800 9100 47000 0 21000.0 268900
## X22075 70500 7500 175000 0 0.0 253000
## X12235 16000 16000 0 0 31000.0 1000
## X7670 12200 34000 22000 0 60600.0 45600
## X16555 13000 1800 15000 0 0.0 29800
## X370 50 1300 0 0 9800.0 -450
## X7680 12700 4200 8000 0 92000.0 24900
## X6880 0 3300 15000 0 3400.0 14900
## X16570 64100 31000 0 0 36200.0 58900
## X12945 4000 9400 0 0 1500.0 11900
## X6725 9050 8800 75000 0 0.0 92850
## X15725 1238000 69000 1600000 0 0.0 4032000
## X19880 4015 38000 7000 0 147400.0 -7385
## X225 813000 15000 130000 0 0.0 975000
## X4995 393440 14400 315000 0 0.0 722840
## X7700 48750 15400 20000 0 230000.0 194150
## X11375 20300 37800 52000 0 18810.0 91290
## X17920 111500 11000 88000 0 32340.0 200160
## X12365 120520 26900 87500 0 17300.0 230120
## X920 93000 7700 59000 0 66000.0 159700
## X19050 313500 7300 500000 0 0.0 1079800
## X19555 18650 30300 64000 0 125900.0 383050
## X10520 2300 0 0 0 600.0 1700
## X18705 16550 15200 111000 0 142000.0 129750
## X5095 60100 9600 333000 0 0.0 402700
## X11010 250 5800 0 0 840.0 5210
## X3540 760 4100 0 0 30.0 4830
## X14950 162700 3700 67000 800 43300.0 223900
## X4830 1350 4800 6000 0 0.0 12150
## X2865 167500 9300 75000 0 0.0 251800
## X20945 122710 29000 110000 0 159500.0 242210
## X13040 19880 13500 0 0 4400.0 28980
## X4515 135700 7900 0 0 780.0 142820
## X145 40 0 0 0 400.0 -360
## X18685 2480 8600 40000 0 5100.0 45980
## X17585 92100 22000 43000 0 92000.0 148900
## X10090 0 0 0 0 1300.0 -1300
## X13235 25000 0 0 0 0.0 525000
## X3045 287000 14700 34000 0 137100.0 324600
## X21425 90 0 0 0 5450.0 -5360
## X11840 0 0 0 0 6000.0 -6000
## X3400 25100 9900 44000 0 16600.0 73400
## X6635 25600 4200 5000 0 2100.0 32700
## X19815 80900 4700 0 0 300.0 85300
## X19565 7350 11800 9000 0 84770.0 26380
## X12135 67650 38900 108000 0 181400.0 168850
## X10700 640000 22000 100000 0 0.0 836000
## X2600 54700 4600 32000 0 98000.0 91300
## X2860 4830 2800 0 0 650.0 6980
## X2175 3600 0 0 0 900.0 2700
## X14915 72800 28800 15000 0 78100.0 108500
## X66351 25600 4200 5000 0 2100.0 32700
## X6575 21000 38000 19000 0 16000.0 709000
## X8410 12800 30000 18000 0 73000.0 39800
## X7230 360350 11000 60000 0 0.0 493850
## X12955 1000 16020 31000 0 20900.0 77120
## X19205 57550 13000 136000 0 60200.0 200350
## X600 9190 19700 27000 8000 139900.0 46990
## X1290 4750 6300 0 0 13400.0 -2350
## X17070 1189000 47000 425000 20000 340000.0 2031000
## X16140 226800 12000 100000 0 0.0 338800
## X17935 36000 31100 102000 0 93500.0 301100
## X3605 92000 17700 0 0 8800.0 100900
## X10275 274000 26500 106000 5000 180600.0 404900
## X19930 32450 23000 178000 0 40000.0 225450
## X15360 137000 40000 185000 0 162900.0 444100
## X1075 158090 14300 37000 2500 0.0 223890
## X7770 172600 6100 65000 0 0.0 243700
## X1010 2700 0 21000 0 108000.0 23700
## X7095 660 0 0 0 0.0 660
## X14255 2960 1900 33000 0 940.0 42920
## X20075 30000 7100 170000 0 11030.0 196070
## X2610 570 2700 0 0 1200.0 2070
## X965 49600 3400 0 0 20.0 52980
## X17515 37000 5000 115000 0 21200.0 155800
## X1755 0 0 0 0 0.0 0
## X16440 329500 81800 91000 0 59000.0 509300
## X14750 9200 2300 12000 0 62000.0 19500
## X16960 82800 31400 100000 0 115000.0 199200
## X575 2000 3800 0 0 0.0 5800
## X12340 317500 5600 217000 0 0.0 540100
## X3250 300 3800 0 0 0.0 4100
## X21805 1325001 15000 570000 25000 188000.0 2087001
## X17860 42400 30000 90000 0 72550.0 164850
## X6260 379000 31000 389000 0 254400.0 794600
## X8435 7160 14000 0 0 13480.0 7680
## X10795 151530 22800 125000 0 0.0 299330
## X9785 6650 11800 15000 0 55990.0 32460
## X17455 6000 2100 23000 0 52500.0 30600
## X11275 351300 12500 200000 0 0.0 563800
## X6785 61005 0 38000 0 93040.0 90965
## X12920 0 0 0 0 0.0 0
## X12685 49650 7400 62000 0 18000.0 114050
## X7575 31700 6800 0 0 300.0 38200
## X16745 3300 5000 2000 0 79800.0 3500
## X3925 0 13200 0 0 20000.0 -6800
## X13715 1140 0 0 0 0.0 1140
## X2630 50500 38000 7000 0 220500.0 224000
## X1880 1220 3100 0 0 1000.0 8320
## X16810 1520 7500 0 0 14100.0 -5080
## X7535 31251 16100 0 0 24700.0 24951
## X17395 0 0 0 0 0.0 0
## X20265 19220 4300 0 0 0.0 23520
## X16645 105330 9900 276000 0 254000.0 386230
## X18180 565400 24600 209000 0 840.0 798160
## X4825 1000 11300 0 0 3020.0 9280
## X1845 168000 21900 250000 0 18600.0 539300
## X5425 22800 21800 80000 0 220300.0 124300
## X10600 214000 54800 89000 30000 12800.0 386000
## X10360 0 0 27000 0 10000.0 27000
## X19890 19300 8800 0 0 5600.0 22500
## X20500 0 0 163000 0 0.0 163000
## X2565 430000 26600 70000 0 5000.0 526600
## X26002 54700 4600 32000 0 98000.0 91300
## X19845 3640 17000 0 0 13000.0 7640
## X18965 88600 42600 67000 0 28000.0 198200
## X11230 0 0 0 0 0.0 0
## X11260 173900 11600 420000 0 300.0 605200
## X3200 45740 41200 201200 0 60500.0 236440
## X5965 24020 7300 0 0 350.0 30970
## X107953 151530 22800 125000 0 0.0 299330
## X11035 51400 2500 60000 0 0.0 113900
## X18245 35800 122300 98800 0 104800.0 491300
## X11955 20000 27900 3000 0 47600.0 80300
## X9345 80 0 0 0 0.0 80
## X2320 25970 18100 118000 0 36680.0 147390
## X9295 46400 6000 290100 0 13100.0 339300
## X20110 16900 0 8000 0 167750.0 24150
## X680 327800 43000 220000 0 0.0 590800
## X13270 81350 9800 0 0 6500.0 84650
## X3075 78050 7800 29000 0 110840.0 71010
## X13160 85100 8800 37000 0 103000.0 120900
## X20435 780000 0 850000 0 850000.0 2780000
## X12465 783000 12100 170000 0 30000.0 1015100
## X4440 760 0 0 0 0.0 760
## X3870 78700 9400 20000 0 9800.0 98300
## X3510 7900 47500 29000 0 86300.0 73100
## X13795 0 0 0 0 0.0 0
## X18155 9000 18200 0 0 68300.0 76900
## X4685 115700 28700 141000 0 73900.0 270500
## X20135 770 17100 0 0 3610.0 14260
## X7975 40000 3500 210000 0 0.0 253500
## X16425 116300 23600 37000 0 201120.0 178780
## X84354 7160 14000 0 0 13480.0 7680
## X12905 196000 9000 58000 0 164000.0 298000
## X15095 133010 34200 0 0 0.0 167210
## X3625 0 0 10000 0 0.0 10000
## X198455 3640 17000 0 0 13000.0 7640
## X570 57001 8400 20000 0 11200.0 74201
## X21195 279390 21700 125000 0 0.0 506340
## X16470 400060 26000 380000 0 0.0 2856060
## X14880 6340 9900 106000 0 49000.0 98240
## X9485 11405 21700 47000 0 37950.0 55155
## X17090 700 4500 0 0 0.0 5200
## X9670 55500 31000 170000 0 139600.0 596900
## X15945 0 10100 0 0 70.0 10030
## X13535 400 22800 0 0 18800.0 4400
## X3685 17370 6400 4800 0 11360.0 25410
## X540 4930 5300 46000 0 258800.0 88930
## X17780 253900 13000 112000 0 28000.0 378900
## X21100 903000 9300 109000 0 34000.0 1093300
## X4310 2800 7700 25000 0 147100.0 3400
## X2010 1200 3000 20000 0 100000.0 24200
## X8785 406000 28800 80000 0 0.0 787800
## X1045 8500 6700 62000 0 23720.0 76480
## X2935 1500 0 70000 0 0.0 71500
## X11195 135000 2200 0 0 2300.0 134900
## X110356 51400 2500 60000 0 0.0 113900
## X3410 88711 25600 62000 0 38700.0 175611
## X17765 12000 4500 0 0 0.0 16500
## X9175 22200 31900 25000 0 107700.0 46400
## X6395 19000 12000 0 0 520.0 30480
## X485 5400 12600 0 0 4140.0 13860
## X870 52440 50700 149000 0 174000.0 208140
## X9220 40 0 0 0 300.0 -260
## X1920 400 6300 28000 0 1600.0 33100
## X19230 169350 18200 300000 0 0.0 487550
## X18475 875950 9100 240000 0 0.0 1125050
## X5895 1120 1700 0 0 0.0 2820
## X3695 40100 5900 8000 0 169000.0 54000
## X17075 126500 13000 10000 0 96700.0 92800
## X21685 800 13900 23000 0 39200.0 24500
## X10410 18300 15000 0 0 0.0 86300
## X1350 0 0 0 0 0.0 0
## X18760 61400 17000 124000 0 303000.0 487900
## X3405 825 13000 0 0 28800.0 -14975
## X12035 1170 8900 28000 0 108800.0 26270
## X305 223800 16200 60000 0 0.0 300000
## X17850 83500 13000 201000 0 129700.0 416800
## X4110 3577000 188000 1400000 0 400000.0 5465000
## X4605 16800 10900 100000 0 0.0 127700
## X12555 0 0 0 0 0.0 0
## X5915 478500 0 410000 0 290000.0 2278500
## X22035 0 0 0 0 0.0 0
## X6930 64000 8000 50000 0 128200.0 303800
## X17060 13800 17000 143000 0 0.0 173800
## X13760 71750 50600 100000 0 10000.0 517350
## X5825 1300 11600 0 0 0.0 12900
## X34057 825 13000 0 0 28800.0 -14975
## X20180 3350 49700 10000 0 156300.0 39750
## X21130 310 4400 83000 0 0.0 87710
## X12205 0 0 0 0 30000.0 -30000
## X1265 7300 9100 85000 0 9100.0 94300
## X13645 108000 17500 200000 0 0.0 325500
## X905 700 0 0 0 0.0 700
## X21995 36300 0 0 0 60.0 36240
## X6975 1100 7600 0 0 870.0 7830
## X16450 20920 12000 0 0 20700.0 12220
## X14840 7000 0 0 0 0.0 7000
## X8300 348950 27200 25000 0 126500.0 370650
## X645 132000 27000 195000 0 155000.0 1074000
## X2770 11400 6300 115600 0 41400.0 96300
## X147508 9200 2300 12000 0 62000.0 19500
## X1540 3560 13000 2000 0 91140.0 3420
## X19435 0 0 0 0 0.0 0
## X6765 10400 38800 32000 0 76200.0 73000
## X54259 22800 21800 80000 0 220300.0 124300
## X19980 45000 13000 100000 0 0.0 158000
## X54010 4930 5300 46000 0 258800.0 88930
## X21890 200650 38400 60000 0 176000.0 263050
## X1220 23200 2900 27000 0 0.0 53100
## X16615 1107000 43700 179000 0 103400.0 1622300
## X16905 680430 9900 200000 0 0.0 1020330
## X9050 780200 33000 360000 0 135000.0 1577800
## X21165 426300 79200 60000 70000 143000.0 632500
## X16350 0 0 0 0 0.0 0
## X14085 200 11000 0 0 12500.0 -1300
## X11465 1600 12800 0 0 800.0 13600
## X12610 30000 23100 4000000 120000 0.0 4173100
## X785 18900 24700 0 0 38000.0 20600
## X14485 1860 9300 31000 0 42200.0 41960
## X8580 101400 6700 24000 0 66300.0 128800
## X10340 500 33000 27000 0 93000.0 35500
## X20855 650 0 0 0 0.0 650
## X5420 12310 13700 -7000 0 168930.0 -32920
## X1200 68700 18600 77000 0 2400.0 161900
## X13395 39500 39600 60000 0 274000.0 145100
## X10230 10000 10000 159000 0 41000.0 179000
## X17945 8960 5200 11000 5100 192650.0 -101390
## X565 300 6700 0 0 1200.0 5800
## X18070 39300 1500 160000 0 0.0 200800
## X509511 60100 9600 333000 0 0.0 402700
## X8940 29000 17500 0 0 9800.0 36700
## X11575 154000 4000 211000 0 9950.0 368050
## X1213512 67650 38900 108000 0 181400.0 168850
## X14770 81950 28200 62000 0 139000.0 151150
## X22015 44800 20800 125000 0 13200.0 277400
## X4965 0 3600 0 0 15000.0 -11400
## X1660 148000 15000 133000 0 23800.0 272200
## X20795 4510 5500 0 0 4200.0 8910
## X2045 2000 27000 6000 0 195800.0 -1800
## X10235 395201 0 400000 0 5000.0 790201
## X12060 220 4400 0 0 900.0 3720
## X5680 203700 31100 27000 50000 253610.0 294190
## X20215 43660 9500 0 0 12000.0 41160
## X15375 3200 6400 20000 0 0.0 29600
## X10740 0 0 102000 0 28600.0 101400
## X4160 389500 0 300000 0 30000.0 714500
## X310 2500 2000 0 0 600.0 3900
## X3235 6200 20100 70000 0 24670.0 91630
## X21055 104220 3900 135000 0 7000.0 254120
## X2620 0 0 0 0 0.0 0
## X1600 35820 0 0 5000 5860.0 34960
## X1751513 37000 5000 115000 0 21200.0 155800
## X5765 0 2000 0 0 650.0 1350
## X16945 413300 20800 90000 0 180.0 553920
## X20830 320000 7600 60000 0 580000.0 7547600
## X10105 412500 21000 12000 0 187600.0 420900
## X4895 63100 9980 18000 0 62000.0 91080
## X9895 0 0 0 0 0.0 0
## X10650 100 2600 40000 0 0.0 42700
## X8705 88000 32100 125000 0 25800.0 219300
## X1490 68000 21600 29088 0 0.0 629600
## X341014 88711 25600 62000 0 38700.0 175611
## X1408515 200 11000 0 0 12500.0 -1300
## X16235 8370 52000 9000 0 89400.0 32970
## X2201516 44800 20800 125000 0 13200.0 277400
## X17115 9100 27700 49000 0 16500.0 144300
## X22110 41450 9800 20000 0 111800.0 62450
## X5075 9550 6600 87000 0 0.0 103150
## X3895 515650 20580 346000 0 54080.0 882150
## X18550 4850 15010 77000 0 56910.0 82950
## X1998017 45000 13000 100000 0 0.0 158000
## X10815 8960 9700 90000 0 0.0 148660
## X130 67800 10000 185000 0 40080.0 497720
## X15700 43950 62000 41000 0 190500.0 103450
## X10560 660 3700 0 0 0.0 4360
## X8180 9500 2600 20000 0 0.0 32100
## X6115 58000 7300 0 0 9850.0 55450
## X11495 182500 17100 47000 0 97100.0 222500
## X17710 187500 24000 125000 6000 0.0 342500
## X10510 0 0 0 0 0.0 0
## X10990 12500 11200 18000 0 31100.0 38600
## X13300 5220 5200 30000 0 93340.0 37080
## X19315 5605 11100 15000 0 72500.0 24205
## X10685 0 3400 0 0 2500.0 900
## X19330 188600 7300 60000 13000 101200.0 267700
## X16260 10 2100 0 0 0.0 2110
## X13945 13600 7900 0 0 98000.0 18500
## X2330 109400 14000 95000 0 100110.0 288290
## X12080 760 6100 0 0 11770.0 -4910
## X16900 2940400 60900 400000 0 124000.0 3697300
## X1080 108810 33000 52000 0 41300.0 320510
## X19180 867350 41500 301000 0 137000.0 1196850
## X2925 561000 13000 182000 9000 168000.0 765000
## X7555 131900 4900 80000 0 0.0 216800
## X16600 85700 37900 122000 0 36300.0 317300
## X16795 8450 38000 5000 0 152250.0 36200
## X16545 132500 17900 110000 0 8050.0 252350
## X20245 2000 22700 3000 0 36490.0 15210
## X9180 137000 36400 33000 0 102500.0 185900
## X16480 448100 38100 239000 50000 16000.0 825200
## X17355 67010 30000 490000 0 120000.0 527010
## X5875 7000 1300 40000 0 0.0 48300
## X16145 26200 11000 137000 0 40500.0 171700
## X21770 0 0 0 0 0.0 0
## X11820 298601 18000 44000 0 91000.0 360601
## X9390 50000 2500 59000 0 0.0 111500
## X12520 6000 0 0 0 0.0 6000
## X12040 128500 18000 300000 0 0.0 637500
## X8890 27700 37300 17000 0 98300.0 83900
## X150 7100 14000 10000 0 45330.0 29770
## X4600 21300 64000 72000 0 28000.0 183300
## X12490 1540 6300 16000 0 42800.0 20040
## X5640 11200 2900 31300 0 5700.0 45400
## X1758518 92100 22000 43000 0 92000.0 148900
## X3105 7885000 92000 900000 0 0.0 9102000
## X1070019 640000 22000 100000 0 0.0 836000
## X7035 26000 2500 63000 0 114000.0 124500
## X19950 949410 38100 30000 0 148750.0 1183760
## X12835 2515 5700 0 0 9540.0 22675
## X12050 5300 9100 -2000 0 84400.0 12000
## X12605 80210 48900 25000 0 119700.0 99410
## X16605 48500 21000 225000 0 0.0 294500
## X100 0 0 0 0 0.0 0
## X21095 386100 24300 70000 0 142560.0 454840
## X9640 14500 1200 21000 0 69260.0 36440
## X18820 2251700 121000 301000 2500000 157050.0 5165650
## X20565 384360 30900 148000 0 187000.0 478260
## X13035 449800 19600 175000 0 0.0 644400
## X15555 2187200 179000 170000 2500000 0.0 13036200
## X8315 286000 8000 300000 0 0.0 664000
## X990 13500 15700 6000 0 91100.0 33100
## X19305 20400 0 0 0 0.0 70400
## X14165 398500 15500 0 0 0.0 564000
## X1785 17300 5400 95000 0 26000.0 118700
## X22045 42000 12000 101000 0 42000.0 137000
## X31520 5400 21000 8000 0 58640.0 17760
## X21535 71150 4000 39000 0 11000.0 114150
## X8005 60000 2000 0 0 0.0 132000
## X21855 3230 4000 0 0 3800.0 78430
## X2965 93500 11600 33000 0 67930.0 137170
## X19925 150700 18600 85000 0 7400.0 246900
## X21305 0 0 0 0 0.0 0
## X11315 1450900 18400 173800 20000 1200.0 1663100
## X2870 200 16300 15300 0 3700.0 31800
## X7845 200 21000 -9000 0 72200.0 -16000
## X11325 30700 29000 39000 0 113400.0 55300
## X7135 28300 0 0 0 21900.0 6400
## X1223521 16000 16000 0 0 31000.0 1000
## X19955 87600 6800 235000 0 0.0 329400
## X12115 27500 0 0 0 1900.0 25600
## X20770 2000 5600 0 0 0.0 7600
## X695 1200 4300 0 0 15570.0 -10070
## X11320 85000 33400 65000 0 90000.0 183400
## X7080 464300 22000 138000 0 0.0 624300
## X21705 2000 17500 6000 0 0.0 25500
## X10855 51700 55500 288000 0 201410.0 355790
## X6340 20100 28600 64000 0 169000.0 99700
## X17450 0 2600 0 0 0.0 2600
## X2895 1001 0 1000 0 550.0 1451
## X8115 716000 5700 200000 0 0.0 921700
## X430 93810 0 0 0 30200.0 63610
## X1027522 274000 26500 106000 5000 180600.0 404900
## X387023 78700 9400 20000 0 9800.0 98300
## X13920 700 12000 0 0 12190.0 510
## X3160 300 2300 0 0 100.0 2500
## X19125 527700 44000 145000 0 30200.0 1066500
## X21480 1953150 44500 700000 220000 0.0 3427650
## X13180 750 4200 0 0 2180.0 2770
## X7970 111000 2500 125000 0 0.0 238500
## X11435 19100 25500 38000 0 118100.0 81500
## X16800 200 0 0 0 160.0 40
## X5575 100 7300 550000 0 0.0 557400
## X9880 700 0 0 0 0.0 700
## X13440 10 1200 0 0 330.0 880
## X17370 600 17400 60000 0 141200.0 66800
## X17200 6000 13200 250000 0 0.0 269200
## X3905 365800 48000 428000 0 123800.0 850000
## X14000 0 0 0 0 0.0 0
## X9710 4600 0 0 0 5000.0 -400
## X5300 282700 13000 100000 0 0.0 395700
## X12985 1200 13820 77000 0 25700.0 691320
## X2007524 30000 7100 170000 0 11030.0 196070
## X15575 4000 4500 112000 0 98200.0 120300
## X4245 400 101900 91000 0 11800.0 190500
## X21505 0 10100 0 0 1200.0 8900
## X4215 530 0 0 0 10000.0 -9470
## X12535 700 3800 165000 0 1200.0 168300
## X16475 6800 6300 60000 0 0.0 73100
## X4570 50 17400 32000 0 40100.0 37350
## X15300 0 0 10000 0 600.0 9400
## X18200 76210 27400 28000 0 68080.0 131530
## X2325 214900 23700 215000 0 151600.0 527000
## X3430 435900 14100 62000 0 3000.0 512000
## X7495 300 4100 31000 0 0.0 35400
## X489525 63100 9980 18000 0 62000.0 91080
## X1680026 200 0 0 0 160.0 40
## X21375 176601 35500 0 0 33200.0 178901
## X11115 14800 12500 1000 0 64000.0 19300
## X5220 22000 6900 0 0 1000.0 27900
## X1488027 6340 9900 106000 0 49000.0 98240
## X21940 18230 15900 174000 0 13700.0 194430
## X1364528 108000 17500 200000 0 0.0 325500
## X21040 11130 16000 0 0 29000.0 -1870
## X7125 39000 36000 70000 0 27710.0 117290
## X8670 510 11200 0 0 10100.0 1610
## X10640 231500 40600 30000 0 118250.0 260850
## X18375 3600 11000 0 0 11220.0 3380
## X20845 6505 20100 4000 0 124900.0 -18295
## X595 685000 17500 137000 0 38000.0 839500
## X1455 227000 15700 184000 0 74200.0 491000
## X8760 0 0 0 0 0.0 0
## X626029 379000 31000 389000 0 254400.0 794600
## X8475 631500 30700 500000 0 0.0 1162200
## X3085 30600 23700 12000 0 56300.0 58000
## X9285 1805000 14000 750000 0 0.0 2569000
## X6940 3580 9800 25000 0 84800.0 29580
## X557530 100 7300 550000 0 0.0 557400
## X16710 4500 20200 44000 0 850.0 67850
## X15515 20 0 0 0 0.0 20
## X3530 108200 12000 130000 0 120000.0 250200
## X6860 2300 0 0 0 0.0 2300
## X14630 6180 9900 0 0 4500.0 11580
## X14705 72760 11000 90000 0 0.0 173760
## X13010 13800 0 0 0 250.0 13550
## X1792031 111500 11000 88000 0 32340.0 200160
## X12705 60030 32500 400000 0 8200.0 484330
## X9870 3000 36800 0 0 21000.0 18800
## X17305 8950 19200 10000 0 75000.0 33150
## X21595 5000 0 2000 0 75400.0 -400
## X13725 24220 10000 70000 0 21000.0 103220
## X10040 0 0 88000 0 0.0 88000
## X7005 29000 16000 58000 0 32900.0 172100
## X3760 42720 24000 0 0 8100.0 58620
## X14910 473000 33000 147000 0 222000.0 624000
## X13365 10 5000 75000 0 0.0 80010
## X11410 2400 3800 261400 0 67600.0 267600
## X12220 992100 88000 250000 60000 250000.0 2390100
## X18420 24450 23100 17000 21000 93000.0 70550
## X9005 100 3200 0 0 10000.0 -6700
## X11855 7200 0 -3000 0 143500.0 -16300
## X21405 0 1800 0 0 0.0 1800
## X21260 14600 9100 30000 0 600.0 53100
## X8020 500 0 0 0 0.0 500
## X10370 4650 5800 10000 0 94100.0 16350
## X15255 3000 8200 50000 0 2000.0 59200
## X12735 45500 18000 153000 0 75300.0 198200
## X2635 133200 28400 32000 0 55000.0 208100
## X4765 11740 9000 0 0 0.0 25740
## X20295 62200 13000 16000 0 171600.0 533600
## X4030 0 3900 0 0 1100.0 2800
## X21360 3730 22300 59000 0 220300.0 75730
## X858032 101400 6700 24000 0 66300.0 128800
## X15240 159600 47000 73900 0 297100.0 389500
## X1052033 2300 0 0 0 600.0 1700
## X8230 5358500 49500 400000 0 10000.0 5843315
## X6565 87890 19400 71000 0 79000.0 178290
## X8210 4000 14500 75000 0 300.0 93200
## X16370 16300 0 0 0 52200.0 34100
## X2495 19500 6000 0 0 150.0 25350
## X4950 79300 9400 89000 0 0.0 206700
## X20625 60000 0 0 0 350000.0 3360000
## X12640 23300 18000 88000 0 173200.0 134100
## X16455 249000 26500 402000 0 186000.0 979500
## X20670 50 3500 0 0 6990.0 -790
## X9855 0 0 0 0 0.0 0
## X7590 3200 6100 35000 0 45100.0 39200
## X10390 160 0 0 0 0.0 160
## X6885 410000 2000 300000 0 18000.0 694000
## X12630 360000 0 0 0 0.0 360000
## X587534 7000 1300 40000 0 0.0 48300
## X6415 115050 32000 135000 0 11190.0 270860
## X13800 241500 20500 150000 0 0.0 412000
## X21210 5500 17000 9000 0 98700.0 8800
## X20775 700 5100 0 0 5000.0 800
## X16165 7200 59800 30000 0 43400.0 253600
## X249535 19500 6000 0 0 150.0 25350
## X18530 57500 15100 135000 0 150410.0 197190
## X1182036 298601 18000 44000 0 91000.0 360601
## X2485 133400 33800 26000 0 127000.0 1508200
## X16785 9900 1700 30000 0 0.0 92600
## X11750 155000 15100 210000 30000 0.0 440100
## X3025 7100 18000 0 0 29000.0 -3900
## X3470 19800 18900 75000 0 13000.0 100700
## X1860 236700 11300 12000 0 325500.0 422500
## X3920 800 5500 0 0 10000.0 -3700
## X19430 5000 3000 25000 0 0.0 33000
## X16535 400 3000 0 0 1240.0 2160
## X13620 22500 5400 0 0 50.0 27850
## X17880 171700 18300 11000 0 85700.0 174300
## X4875 220300 29200 60000 0 335000.0 419500
## X19300 4300 2700 16500 0 10400.0 21600
## X7075 7500 0 0 0 32500.0 -25000
## X15130 2330000 67400 165000 200000 110000.0 3342400
## X13555 77000 47000 8000 0 172000.0 212000
## X8385 400 20700 0 0 11380.0 9720
## X831537 286000 8000 300000 0 0.0 664000
## X1330 3940 20600 0 0 32330.0 1710
## X6710 125000 65200 60000 0 115700.0 194500
## X6055 20500 47000 600000 0 1800.0 1065700
## X20455 1384000 82000 350000 70000 70000.0 2643000
## X2025 558600 43800 441000 0 259000.0 1307480
## X8485 50800 4700 18000 0 52050.0 73450
## X6475 1000 0 0 0 10300.0 -9300
## X4305 1378000 23000 210000 0 0.0 1611000
## X6900 540000 48800 130000 0 0.0 1018800
## X14525 607900 36000 408000 0 468410.0 1253490
## X3070 55020 6200 36000 0 84500.0 96720
## X811538 716000 5700 200000 0 0.0 921700
## X1420 104500 0 0 0 0.0 104500
## X542039 12310 13700 -7000 0 168930.0 -32920
## X18380 748800 20200 30000 0 0.0 1184500
## X4185 51230 6100 90000 0 0.0 147330
## X13830 26020 3200 0 0 14000.0 15220
## X6590 134930 26900 55000 0 141070.0 195760
## X13340 10500 4200 0 0 6080.0 8620
## X5625 197000 32600 185000 0 324570.0 480030
## X9625 285100 11900 258000 0 30000.0 597000
## X12020 24000 2600 180000 0 0.0 206600
## X9580 527330 12300 212000 0 638000.0 1751630
## X277040 11400 6300 115600 0 41400.0 96300
## X50 71000 12000 0 0 25110.0 57890
## X369541 40100 5900 8000 0 169000.0 54000
## X7540 39300 3200 0 15000 0.0 57500
## X1030 2000 0 0 0 0.0 2000
## X14400 0 0 50000 0 0.0 50000
## X7415 61000 6500 96000 0 105300.0 169200
## X3990 0 0 0 0 0.0 0
## X3245 79300 16000 70000 0 0.0 165300
## X2575 23200 5600 0 30000 142120.0 751680
## X9105 1500 5700 15000 0 17400.0 34800
## X7985 3240 48400 63000 22000 118700.0 158440
## X1300 24910 15000 215000 0 0.0 254910
## X4760 144300 33300 259000 20000 181800.0 460800
## X16305 60 2600 0 0 5300.0 -2640
## X21035 216000 4800 70000 0 80800.0 290000
## X2905 43000 0 0 0 0.0 43000
## X1610 100500 14000 50000 0 2000.0 162500
## X3490 31500 17000 37000 0 93000.0 335500
## X16585 50 0 0 0 2300.0 -2250
## X4145 30000 4900 30000 0 0.0 64900
## X3135 0 14000 94000 0 56000.0 108000
## X6000 37500 4300 0 0 0.0 41800
## X10420 10000 15000 0 0 0.0 25000
## X1655 0 10200 32000 0 46900.0 43300
## X10705 4000 0 0 0 0.0 4000
## X11735 388900 48800 230000 0 70000.0 667700
## X6720 1700 4700 0 0 0.0 6400
## X12680 6150 11700 84000 0 36000.0 71850
## X7530 7200 29200 30000 0 12000.0 54400
## X7795 1421 9100 0 0 7800.0 2721
## X1480 23850 14000 0 0 0.0 37850
## X21575 227540 29000 100000 0 153000.0 333540
## X2585 47000 28100 30000 0 80350.0 261050
## X16595 0 0 0 0 0.0 0
## X4040 100 3600 35000 0 150.0 38550
## X1630542 60 2600 0 0 5300.0 -2640
## X16330 12840 4800 0 0 300.0 17340
## X15665 72640 24000 60000 0 26900.0 269740
## X690043 540000 48800 130000 0 0.0 1018800
## X6795 400 17200 55000 0 109900.0 222700
## X21350 3400 3200 113000 1500 17220.0 119880
## X6700 658000 21000 155000 0 95000.0 2343000
## X18665 103000 20300 57000 0 63000.0 180300
## X19580 192800 44000 37000 1200 72900.0 240100
## X20130 16200 26600 6000 0 101100.0 36700
## X10325 5720 6600 0 0 6500.0 5820
## X4130 798780 23000 72000 0 166000.0 1403780
## X5475 36990 37900 175000 0 8220.0 316170
## X1790 1500 22900 150000 0 0.0 196900
## X17480 31650 11600 172000 0 102240.0 210010
## X12830 52600 21000 0 0 10100.0 63500
## X1865 25500 12800 44000 0 32300.0 81000
## X768044 12700 4200 8000 0 92000.0 24900
## X457045 50 17400 32000 0 40100.0 37350
## X11490 28720 25000 200000 0 0.0 253720
## X1146546 1600 12800 0 0 800.0 13600
## X10180 3100 2700 0 0 17500.0 -11700
## X3910 647000 7100 0 0 0.0 786600
## X11565 33750 35000 91000 0 147370.0 291380
## X21825 2100 28000 0 0 19600.0 10500
## X4525 1000 2300 0 0 0.0 3300
## X3060 1300 1800 48000 0 0.0 51100
## X9250 269930 11000 10000 0 80000.0 290930
## X17500 65000 34500 175000 0 450.0 299050
## X1100 0 17700 0 0 7000.0 10700
## X16025 2451 19200 200000 0 150000.0 221651
## X12380 163500 23000 79000 0 147700.0 263800
## X753547 31251 16100 0 0 24700.0 24951
## X12850 9100 18000 0 0 11240.0 15860
## X1955548 18650 30300 64000 0 125900.0 383050
## X19775 2750 3000 0 0 0.0 30750
## X11525 391 0 0 0 810.0 -419
## X2975 600 5000 55000 0 770.0 60830
## X18895 2600 6900 125000 0 23600.0 130900
## X1602549 2451 19200 200000 0 150000.0 221651
## X345 9620 14000 9000 0 50000.0 32620
## X490 18200 29000 41000 0 259000.0 88200
## X14580 22970000 89000 700000 0 2000.0 26208000
## X10875 102500 38000 100000 0 78150.0 222350
## X5270 7390 9800 0 0 31900.0 -14710
## X9400 230 3400 40000 0 2360.0 41270
## X12900 51400 15400 90000 0 380.0 168420
## X4530 43500 2400 75000 0 0.0 120900
## X17670 0 0 130000 0 0.0 130000
## X5440 4330 26700 28000 0 97080.0 33950
## X8875 138530 41600 41000 0 137250.0 232880
## X2060 147940 2800 180000 0 0.0 330740
## X2153550 71150 4000 39000 0 11000.0 114150
## X5080 11500 25100 78000 0 43740.0 81860
## X12500 61300 43000 133000 0 119000.0 235300
## X830 100 0 0 0 0.0 100
## X495051 79300 9400 89000 0 0.0 206700
## X1304052 19880 13500 0 0 4400.0 28980
## X16685 500 6600 0 0 4800.0 2300
## X5695 145900 14000 55000 0 10180.0 204720
## X2026553 19220 4300 0 0 0.0 23520
## X215 101050 4700 70000 0 55000.0 175750
## X7460 77300 24100 5000 0 154000.0 57400
## X21060 100500 0 0 0 0.0 100500
## X3770 160600 19500 87000 0 0.0 267100
## X940054 230 3400 40000 0 2360.0 41270
## X15320 7600 9900 0 0 7940.0 9560
## X96555 49600 3400 0 0 20.0 52980
## X19340 36000 12000 100000 0 0.0 148000
## X1395 1800 20600 0 0 7810.0 14590
## X939056 50000 2500 59000 0 0.0 111500
## X5245 1430 8400 32000 0 3080.0 41750
## X18830 2650 6000 24000 0 33710.0 30940
## X15215 72000 19000 58000 0 132400.0 133600
## X496557 0 3600 0 0 15000.0 -11400
## X12210 23500 5200 0 6000 0.0 34700
## X17560 17700 24000 69000 0 101930.0 99770
## X19625 38700 37200 124000 0 21000.0 199900
## X2530 50000 6100 50000 0 590.0 105510
## X9075 16000 20000 17000 0 80400.0 37600
## X1925 41450 2200 30000 0 0.0 73650
## X21010 140000 23100 92000 0 0.0 263100
## X1745058 0 2600 0 0 0.0 2600
## X17555 5700 4200 38000 0 81000.0 40900
## X2018059 3350 49700 10000 0 156300.0 39750
## X5330 1574000 61000 1200000 0 0.0 4275000
## X2150560 0 10100 0 0 1200.0 8900
## X2970 12600 0 0 0 0.0 12600
## X19190 51900 10000 76000 2000 179200.0 134700
## X12570 5450 5300 10000 0 28480.0 20270
## X1325 6600 16600 56000 0 90700.0 481500
## X4195 70000 26600 67000 0 112000.0 139600
## X20915 3000 5000 105000 0 52000.0 106000
## X14145 749300 52900 47000 0 84000.0 848200
## X13090 300 4500 0 0 10100.0 37200
## X2211061 41450 9800 20000 0 111800.0 62450
## X13062 67800 10000 185000 0 40080.0 497720
## X7190 10 1400 0 0 900.0 510
## X10690 200 0 41000 0 35000.0 41200
## X21495 2000 10700 26000 0 24000.0 38700
## X3745 11370 22900 38000 0 83650.0 60620
## X2315 58310 5100 0 0 500.0 62910
## X3170 160 3000 0 0 0.0 3160
## X10940 293900 9200 250000 0 0.0 553100
## X2116563 426300 79200 60000 70000 143000.0 632500
## X2109564 386100 24300 70000 0 142560.0 454840
## X233065 109400 14000 95000 0 100110.0 288290
## X17530 37000 13800 65000 0 100280.0 115520
## X12410 3084500 59000 876000 0 0.0 4569500
## X1694566 413300 20800 90000 0 180.0 553920
## X1250 205280 48000 115000 0 21550.0 346730
## X1323567 25000 0 0 0 0.0 525000
## X17190 100950 31000 271000 0 134675.0 397275
## X103068 2000 0 0 0 0.0 2000
## X1018069 3100 2700 0 0 17500.0 -11700
## X18295 300 3800 0 0 6000.0 -1900
## X8770 11300 7400 220000 0 0.0 238700
## X585 65300 17000 48000 0 172250.0 120050
## X8750 2000 28100 21000 0 90800.0 1094300
## X13955 7900 15500 222000 0 29400.0 254000
## X18825 8551000 4500 1500000 0 55300.0 35953200
## X12280 41500 46700 23000 0 118500.0 95700
## X21780 161000 10250 12000 0 113200.0 183050
## X17810 246000 41600 132000 0 83000.0 444600
## X2535 226000 3300 90000 0 50050.0 319250
## X1614070 226800 12000 100000 0 0.0 338800
## X17290 32510 0 75700 0 9300.0 108210
## X15400 605700 31000 240000 0 2400.0 2897300
## X1842071 24450 23100 17000 21000 93000.0 70550
## X1650 5000 2700 82000 0 54700.0 438000
## X2050072 0 0 163000 0 0.0 163000
## X360 18900 0 -4000 0 64200.0 13700
## X1895 185850 24700 98000 0 102000.0 308550
## X7350 1020 12300 10000 0 6720.0 16600
## X107573 158090 14300 37000 2500 0.0 223890
## X2204574 42000 12000 101000 0 42000.0 137000
## X10345 20 4000 0 0 64960.0 -60940
## X90575 700 0 0 0 0.0 700
## X75 60000 24900 50000 0 123000.0 112900
## X1550 176860 7300 77000 0 29210.0 251950
## X6040 500 2300 7000 0 750.0 9050
## X3525 820750 17700 320000 0 800.0 1219650
## X6980 169940 34900 220000 0 31660.0 418180
## X178576 17300 5400 95000 0 26000.0 118700
## X16935 9200 0 0 2500 0.0 11700
## X928577 1805000 14000 750000 0 0.0 2569000
## X9615 0 3000 600 0 81400.0 -25400
## X17940 2030 15500 32000 0 9750.0 39780
## X2855 0 0 0 0 0.0 0
## X17900 22590 14100 25000 0 119900.0 -58210
## X1222078 992100 88000 250000 60000 250000.0 2390100
## X2020 37550 34000 98600 0 29870.0 141680
## X10380 0 6700 0 0 9300.0 -2600
## X4000 29100 67400 135000 0 0.0 231500
## X90 5900 24100 20000 0 92800.0 26200
## X19145 124100 6800 62000 0 48000.0 192900
## X9140 3000 2500 0 0 0.0 14500
## X2300 35 5000 0 0 3200.0 1835
## X13560 730 3900 0 0 1000.0 3630
## X1767079 0 0 130000 0 0.0 130000
## X9665 67520 28000 21000 0 96300.0 99220
## X9065 5000 2200 40000 0 6200.0 41000
## X12715 12380 55800 0 0 23430.0 44750
## X1069080 200 0 41000 0 35000.0 41200
## X15150 48000 14000 0 0 16180.0 45820
## X14780 3447000 22000 123000 0 102000.0 3592000
## X3080 123750 24200 67000 0 146500.0 148450
## X1023081 10000 10000 159000 0 41000.0 179000
## X9725 38700 18900 77000 0 56300.0 116300
## X1330082 5220 5200 30000 0 93340.0 37080
## X3215 339000 15300 127000 0 98000.0 481300
## X1069083 200 0 41000 0 35000.0 41200
## X19635 0 0 0 0 601.0 -601
## X14800 3300 15200 26000 0 88000.0 30500
## X2105584 104220 3900 135000 0 7000.0 254120
## X21380 0 0 0 0 1190.0 -1190
## X2024585 2000 22700 3000 0 36490.0 15210
## X11040 10250 11400 38000 0 35600.0 58050
## X12070 23220 16000 69000 0 6400.0 101820
## X3465 70410 36000 64000 0 117000.0 149410
## X20725 0 0 52000 0 0.0 52000
## X15730 49100 33600 84000 0 274440.0 137260
## X17005 603500 22000 300000 0 615000.0 3060500
## X4065 124440 8700 54000 0 6000.0 198140
## X620 300 0 40000 0 0.0 40300
## X1776586 12000 4500 0 0 0.0 16500
## X2715 1900 7200 55000 0 111000.0 58100
## X5210 27300 14000 40000 0 136000.0 75300
## X1303587 449800 19600 175000 0 0.0 644400
## X18625 280210 26000 93000 0 72050.0 392160
## X20460 300 6400 0 0 2100.0 4600
## X4700 20000 12200 96000 0 11400.0 120800
## X256588 430000 26600 70000 0 5000.0 526600
## X4180 2499000 49000 773000 0 302000.0 3259200
## X5760 196450 4900 134000 0 231500.0 354850
## X286589 167500 9300 75000 0 0.0 251800
## X5745 31700 9900 0 0 54500.0 30100
## X5175 360 8300 0 0 2200.0 6460
## X15105 47010 9900 23000 0 30150.0 76760
## X19895 370 18000 0 0 0.0 68370
## X1210 111000 42000 71000 0 81000.0 212000
## X1998090 45000 13000 100000 0 0.0 158000
## X202091 37550 34000 98600 0 29870.0 141680
## X14570 12300 15800 23000 15000 82400.0 49700
## X2100 20480 5300 0 0 0.0 25780
## X1208092 760 6100 0 0 11770.0 -4910
## X21340 28640 22200 0 0 490.0 50350
## X14250 200 0 0 0 46400.0 -7200
## X1719093 100950 31000 271000 0 134675.0 397275
## X2105594 104220 3900 135000 0 7000.0 254120
## X11425 386330 34100 105000 0 125000.0 525430
## X2135095 3400 3200 113000 1500 17220.0 119880
## X13565 40050 17900 55000 0 35000.0 112950
## X17540 691800 31400 425000 0 0.0 1148200
## X2985 19000 4100 0 0 63000.0 23100
## X15070 7680 0 0 0 19260.0 -11580
## X3505 22800 18200 140000 0 0.0 249000
## X15015 1530 13000 30000 0 50930.0 43600
## X16815 500 0 0 0 9600.0 -9100
## X7485 27300 3200 20000 0 64000.0 50500
## X18460 154500 13000 50000 0 53450.0 264050
## X9465 10000 0 0 0 12000.0 60000
## X10825 2550 5500 0 0 0.0 8050
## X8105 197150 10000 66000 6000 189000.0 274150
## X5820 147000 31000 39000 0 92900.0 208100
## X14765 417500 46000 90000 0 1200.0 552300
## X5340 1168010 13000 0 25000 0.0 1286010
## X3720 0 0 0 0 0.0 0
## X4475 47350 0 150000 0 0.0 197350
## X15185 8420 5600 16000 0 55500.0 28520
## X68096 327800 43000 220000 0 0.0 590800
## X13745 0 2900 -6000 0 26000.0 -3100
## X125097 205280 48000 115000 0 21550.0 346730
## X8345 3600 17700 0 0 6830.0 14470
## X2435 285 10000 0 0 1590.0 8695
## X1900 650200 22000 80000 0 0.0 752200
## X11670 40200 0 0 0 0.0 40200
## X18465 63500 25500 91000 0 28000.0 161000
## X20605 38100 2500 0 3000 600.0 43000
## X1127598 351300 12500 200000 0 0.0 563800
## X15815 2100 26960 95000 0 75300.0 267460
## X4465 99600 8500 160000 0 100000.0 319100
## X14585 137500 17600 90000 2000 15700.0 241400
## X12930 24850 10300 10000 0 126000.0 39150
## X3875 57000 10000 0 0 27560.0 39440
## X11340 848300 103100 100000 0 1400.0 1990000
## X4985 116880 20200 106000 0 0.0 243080
## X21245 1 6300 20000 0 1000.0 25301
## X1203599 1170 8900 28000 0 108800.0 26270
## X18640 3150 4500 9000 0 10260.0 6390
## X3875100 57000 10000 0 0 27560.0 39440
## X7570 3650 9100 0 0 9200.0 3550
## X16115 359100 14100 107000 0 50000.0 473200
## X6355 0 6600 0 0 3300.0 3300
## X17615 430 11500 31000 0 117450.0 34480
## X6920 0 0 0 0 0.0 0
## X8960 7370 0 0 0 2500.0 59870
## X2325101 214900 23700 215000 0 151600.0 527000
## X255 43000 8800 345000 0 80000.0 424800
## X19515 58110 19000 0 0 9900.0 74210
## X19205102 57550 13000 136000 0 60200.0 200350
## X15825 1000 14900 60000 0 115000.0 115900
## X9090 19080 26200 68000 0 32200.0 113080
## X4540 9400 0 64000 0 0.0 73400
## X15225 2061430 85000 1130000 0 677900.0 3268530
## X10300 120 0 0 0 0.0 120
## X21650 2300 11400 13332 0 83930.5 103100
## X3780 10 0 0 0 0.0 10
## X13835 2500 27000 20000 0 72300.0 27200
## X5100 139000 13000 273000 0 99900.0 422100
## X11025 22700 24000 40000 0 96000.0 70700
## X13740 37700 9000 19000 0 41000.0 65700
## X17905 7060 9800 3000 0 142140.0 -35280
## X925 2500 13000 0 0 0.0 15500
## X11925 510 0 164900 0 5350.0 165160
## X5210103 27300 14000 40000 0 136000.0 75300
## X14005 0 0 0 0 890.0 -890
## X17815 18600 0 65000 0 0.0 83600
## X14200 23500 14200 89000 0 8500.0 118200
## X2855104 0 0 0 0 0.0 0
## X310105 2500 2000 0 0 600.0 3900
## X15355 108200 16300 10000 0 0.0 134500
## X15135 23800 20900 32000 0 43800.0 71400
## X9020 19950 12000 80000 0 129900.0 102050
## X18630 32820 13000 34000 0 150800.0 50020
## X17315 2505 12500 0 0 4430.0 10575
## X19685 111300 13000 -2000 0 24900.0 112400
## X7100 620 11700 9200 0 4550.0 17770
## X12945106 4000 9400 0 0 1500.0 11900
## X7655 468000 0 353000 0 364000.0 8743300
## X20875 5450 0 0 0 0.0 5450
## X9300 1910 6700 22000 75000 63880.0 101730
## X16905107 680430 9900 200000 0 0.0 1020330
## X155 7100 13900 0 0 104400.0 7640
## X15280 1700 0 0 0 250.0 1450
## X17385 1200 0 0 0 0.0 1200
## X12880 6210 12600 0 0 23780.0 -4970
## X1595 310090 33100 113000 0 102000.0 456190
## X5720 5200 0 26000 0 60100.0 30100
## X17375 339500 28900 170000 0 70000.0 538400
## X11495108 182500 17100 47000 0 97100.0 222500
## X7680109 12700 4200 8000 0 92000.0 24900
## X2590 10170 9900 0 5000 13780.0 11290
## X7200 9700 2400 76000 0 410.0 87690
## X1575 81610 30000 23000 0 88280.0 108330
## X12065 0 0 0 0 0.0 0
## X9715 86500 29000 25000 0 158650.0 136850
## X5065 0 0 0 0 340.0 -340
## X9520 3300 0 0 0 4000.0 -700
## X20565110 384360 30900 148000 0 187000.0 478260
## X2100111 20480 5300 0 0 0.0 25780
## X5175112 360 8300 0 0 2200.0 6460
## X15880 21300 33800 121000 0 161300.0 143800
## X615 1000 7500 0 0 4400.0 4100
## X19490 201000 27900 47000 0 184900.0 244000
## X13850 2400 0 0 0 1100.0 1300
## X14070 21920 5800 78000 0 1600.0 104120
## X16555113 13000 1800 15000 0 0.0 29800
## X21750 1500 12000 40000 0 57000.0 46500
## X1305 105300 21000 75900 0 4100.0 202200
## X5210114 27300 14000 40000 0 136000.0 75300
## X14400115 0 0 50000 0 0.0 50000
## X4120 125500 6700 983400 0 0.0 1819700
## X13600 57500 12000 147000 0 129000.0 205500
## X1670 800 7800 63000 0 157000.0 51600
## X8790 111900 30000 159000 0 133000.0 278900
## X8150 87750 3800 45000 0 51300.0 120250
## X14155 42500 16700 150000 0 0.0 439200
## X17905116 7060 9800 3000 0 142140.0 -35280
## X16735 34700 11000 0 0 3000.0 83700
## X21095117 386100 24300 70000 0 142560.0 454840
## X10280 12700 7000 6200 10000 2100.0 33800
## X8695 5770 6900 0 0 2160.0 10510
## X15485 12000 0 60000 0 195000.0 67000
## X920118 93000 7700 59000 0 66000.0 159700
## X525 10300 12000 50000 0 0.0 72300
## X10740119 0 0 102000 0 28600.0 101400
## X8885 490 13000 0 0 12800.0 690
## X20200 2300 6500 0 0 38200.0 -29400
## X2295 720 6200 112000 0 39000.0 112920
## X14855 800 0 0 0 0.0 800
## X20390 105000 1600 60000 0 160000.0 166600
## X13895 12340 26000 43000 0 68940.0 62400
## X12335 271000 17500 200000 0 2900.0 485600
## X11880 62000 23700 65000 0 1500.0 149200
## X3750 17500 3800 0 0 3000.0 24300
## X16305120 60 2600 0 0 5300.0 -2640
## X11875 42100 1900 50000 0 26300.0 67700
## X7670121 12200 34000 22000 0 60600.0 45600
## X6130 0 950 0 0 0.0 950
## X8050 150 22900 21000 0 55600.0 17450
## X17970 0 0 0 0 0.0 0
## X18735 50 4100 0 0 0.0 4150
## X6920122 0 0 0 0 0.0 0
## X235 20000 12700 0 0 63740.0 -31040
## X5530 231500 20500 0 0 16000.0 236000
## X18720 15400 12120 0 0 0.0 97520
## X2330123 109400 14000 95000 0 100110.0 288290
## X335 11800 5000 0 0 2350.0 14450
## X19405 4840 23500 15000 0 146950.0 36390
## X2615 4100 13400 63000 0 76900.0 78600
## X8060 437200 36100 400000 0 0.0 873300
## X7990 630 5000 0 0 8300.0 -2670
## X17205 1600 0 17000 0 133480.0 18120
## X110 25800 0 11000 0 58000.0 34800
## X16470124 400060 26000 380000 0 0.0 2856060
## X8690 36000 14100 34000 0 85960.0 80140
## X3350 198400 23000 80000 30000 40000.0 331400
## X2440 11070 17000 0 0 23260.0 4810
## X9570 100 3300 0 0 0.0 3400
## X19650 79650 7700 23000 0 113200.0 109150
## X12680125 6150 11700 84000 0 36000.0 71850
## X6175 4800 9800 80000 0 220.0 94380
## X2860126 4830 2800 0 0 650.0 6980
## X21470 600 1900 0 0 2000.0 500
## X9360 13400 10000 0 0 11400.0 12000
## X3235127 6200 20100 70000 0 24670.0 91630
## X10540 1370300 20800 122000 0 0.0 1523100
## X18595 293500 37800 202000 0 218000.0 819300
## X13935 1038000 24800 370000 15000 30000.0 1447800
## X16950 263700 17700 115000 0 4400.0 392000
## X9715128 86500 29000 25000 0 158650.0 136850
## X11980 200 5800 3000 0 17000.0 9000
## X2345 79020 13000 25000 10000 70700.0 117320
## X21130129 310 4400 83000 0 0.0 87710
## X12600 3100 4200 0 0 0.0 7300
## X3470130 19800 18900 75000 0 13000.0 100700
## X9425 19800 2500 0 0 0.0 22300
## X21625 1000 21000 0 0 13000.0 9000
## X13110 32660 0 46000 0 61900.0 689760
## X10765 115770 9400 85000 8000 60000.0 218170
## X10290 163300 12000 72000 0 116000.0 262300
## X20650 276200 1800 400000 0 0.0 678000
## X20680 87500 0 0 0 40110.0 47390
## X20325 130300 18800 17000 0 58260.0 160840
## X15740 750 1900 0 0 0.0 2650
## X10040131 0 0 88000 0 0.0 88000
## X2085 780 17600 0 0 0.0 18380
## X18375132 3600 11000 0 0 11220.0 3380
## X15970 1273000 19000 300000 0 0.0 1592000
## X15490 65850 10850 68568 0 0.0 157900
## X9805 200 0 36000 0 8000.0 28200
## X19805 12070 15700 0 1000 26950.0 1820
## X6710133 125000 65200 60000 0 115700.0 194500
## X20265134 19220 4300 0 0 0.0 23520
## X16850 20 4100 0 0 100100.0 -95980
## X10875135 102500 38000 100000 0 78150.0 222350
## X10560136 660 3700 0 0 0.0 4360
## X19625137 38700 37200 124000 0 21000.0 199900
## X13545 5300 1900 70000 0 15000.0 77200
## X13725138 24220 10000 70000 0 21000.0 103220
## X13385 10000 20000 65000 0 147000.0 474000
## X16935139 9200 0 0 2500 0.0 11700
## X4930 63000 0 65000 0 0.0 308000
## X20930 50830 21200 41000 0 28960.0 103070
## X17100 29300 46400 179000 35000 15000.0 354700
## X18795 42200 14300 57000 0 3000.0 145500
## X1315 1550 0 0 0 170.0 1380
## X3990140 0 0 0 0 0.0 0
## X6590141 134930 26900 55000 0 141070.0 195760
## X10940142 293900 9200 250000 0 0.0 553100
## X17560143 17700 24000 69000 0 101930.0 99770
## X300 16260 7500 0 0 2000.0 21760
## X11475 9610 26600 0 0 54880.0 -18670
## X15370 6640 1700 0 0 1800.0 6540
## X12230 130 2000 0 0 0.0 2130
## X6570 400 9200 50000 0 27550.0 157050
## X13610 90 29600 9200 0 19760.0 29930
## X1940 14500 4000 0 0 19800.0 -1300
Next, assign to income the column INCOME in the cfb data frame, and determine the mean and median income values.
income <- cfb$INCOME
mean(income)## [1] 63402.66
median(income)## [1] 38032.7
The first output is the mean income and the second is the median income. Mean income is greater than median income. This indicates there are more small income values than large income values, but some of the large income values are very large.
This ‘skewness’ in the distribution of values can be seen on a histogram. A histogram is a plot that displays the frequency of the values using intervals that divide the values into equal bins.
This is done with the hist() function. Here you specify the number of intervals with the breaks = argument.
hist(income,
breaks = 25)
The distribution is said to be right skewed. It has a long right tail.
Note: Some packages come with data sets. To see what data is available in a package, type
data(package = "UsingR")Spread
A simple measure of the spread of data values is the range. The range is given by the minimum and maximum value or by the difference between them.
range(income)## [1] 0 1541866
diff(range(income))## [1] 1541866
Or using the central tendency as the center of a set of values, you can define spread in terms of deviations from the center.
The sum of the squared deviations from the center divided by sample length minus one is the sample variance.
var(income)## [1] 13070833215
sqrt(var(income))## [1] 114327.7
sd(income)## [1] 114327.7
To illustrate consider two sets of test scores.
ts1 <- c(80, 85, 75, 77, 87, 82, 88)
ts2 <- c(100, 90, 50, 57, 82, 100, 86)Some test score statistics are
mean(ts1)## [1] 82
mean(ts2)## [1] 80.71429
var(ts1)## [1] 24.66667
var(ts2)## [1] 394.2381
Vector types
All the elements of a vector must have the same type. That is you can’t mix numbers with character strings.
Consider the following character strings.
simpsons <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
simpsons## [1] "Homer" "Marge" "Bart" "Lisa" "Maggie"
Note that character strings are made with matching quotes, either double, ", or single, ’.
If you mix element types within a data vector, all elements will change into the ‘lowest’ common type, which is usually a character. Arithmetic does not work on character elements.
Returning to the land falling hurricane counts.
cD1 <- c(2, 3, 0, 3, 1, 0, 0, 1, 2, 1)
cD2 <- c(0, 5, 4, 2, 3, 0, 3, 3, 2, 1)Now suppose the National Hurricane Center (NHC) reanalyzes a storm, and that the 6th year of the 2nd decade is a 1 rather than a 0 for the number of landfalls. In this case you type
cD2[6] <- 1The assignment to the 6th element in the vector cD2 is done by referencing the 6th element of the vector with square brackets [].
It’s important to keep this in mind: Parentheses () are used for functions and square brackets [] are used to get values from vectors (and arrays, lists, etc). REPEAT: [] are used to extract or subset values from vectors, data frames, matrices, etc.
Print out all the elements of a data vector, print the 2nd element, the 4th element, all but the 4th element, all odd number elements.
cD2## [1] 0 5 4 2 3 1 3 3 2 1
cD2[2] ## [1] 5
cD2[4]## [1] 2
cD2[-4]## [1] 0 5 4 3 1 3 3 2 1
cD2[c(1, 3, 5, 7, 9)] ## [1] 0 4 3 3 2
R’s console keeps a history of our commands. The previous commands are accessed using the up and down arrow keys. Repeatedly pushing the up arrow will scroll backward through the history so you can reuse previous commands.
Many times you wish to change only a small part of a previous command, such as when a typo is made. With the arrow keys you access the previous command then edit it as desired.
Structured data
When data are in a pattern; for instance the integers 1 through 99. The colon : function is used for creating simple sequences.
1:100## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
## [91] 91 92 93 94 95 96 97 98 99 100
rev(1:100)## [1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83
## [19] 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65
## [37] 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47
## [55] 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29
## [73] 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11
## [91] 10 9 8 7 6 5 4 3 2 1
100:1## [1] 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83
## [19] 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65
## [37] 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47
## [55] 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29
## [73] 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11
## [91] 10 9 8 7 6 5 4 3 2 1
It’s often necessary to specify either the step size and the starting and ending points or the starting and ending points and the length of the sequence. The seq() function does this.
seq(from = 1, to = 9, by = 2)## [1] 1 3 5 7 9
seq(from = 1, to = 10, by = 2)## [1] 1 3 5 7 9
seq(from = 1, to = 9, length = 5)## [1] 1 3 5 7 9
To create a vector with each element having the same value use the rep() function (replicate). The simplest usage is to replicate the first argument a specified number of times.
rep(1, times = 10)## [1] 1 1 1 1 1 1 1 1 1 1
rep(1:3, times = 3)## [1] 1 2 3 1 2 3 1 2 3
More complicated patterns can be repeated by specifying pairs of equal-sized vectors. In this case, each element of the first vector is repeated the corresponding number of times specified by the element in the second vector.
rep(c("long", "short"), times = c(1, 2))## [1] "long" "short" "short"
Asking questions
To find the most landfalls in the first decade, type:
max(cD1)## [1] 3
Which years had the most?
cD1 == 3## [1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Notice the double equals signs (==). This tests each value (element) in cD1 to see if it is equal to 3. The 2nd and 4th values are equal to 3 so TRUEs are returned. Think of this as asking R a question. Is the value equal to 3? R answers all at once with a vector of TRUEs and FALSEs.
How do you get the vector element corresponding to the TRUE values? That is, which years have 3 landfalls?
which(cD1 == 3)## [1] 2 4
The function which.max() can be used to get the first maximum.
which.max(cD1)## [1] 2
You might also want to know the total number of landfalls in each decade and the number of years in a decade without a landfall. Or how about the ratio of the mean number of landfalls over the two decades.
sum(cD1)## [1] 13
sum(cD2)## [1] 24
sum(cD1 == 0)## [1] 3
sum(cD2 == 0)## [1] 1
mean(cD2) / mean(cD1)## [1] 1.846154
There are 85% more landfalls during the second decade. Is this increase statistically significant?
To remove an object from the current environment you use the rm() function. Usually not needed unless you have very large objects (e.g., million cases).
rm(cD1, cD2)Tables and summaries
All elements of a vector must be of the same type. For example, the vectors A, B, and C below are constructed as numeric, logical, and character, respectively.
First create the vectors then check the class.
A <- c(1, 2.2, 3.6, -2.8)
B <- c(TRUE, TRUE, FALSE, TRUE)
C <- c("Cat 1", "Cat 2", "Cat 3")
class(A)## [1] "numeric"
class(B)## [1] "logical"
class(C)## [1] "character"
With logical and character vectors the table() function indicates how many occurrences for each element type. For instance, let the vector wx denote the weather conditions for five forecast periods as character data.
wx <- c("sunny", "clear", "cloudy", "cloudy", "rain")
class(wx)## [1] "character"
table(wx)## wx
## clear cloudy rain sunny
## 1 2 1 1
The output is a list of the character strings and the corresponding number of occurrences of each string.
As another example, let the vector ss denote the Saffir-Simpson category for a set of five hurricanes.
ss <- c("Cat 3", "Cat 2", "Cat 1", "Cat 3", "Cat 3")
table(ss)## ss
## Cat 1 Cat 2 Cat 3
## 1 1 3
Here the character strings correspond to different intensity levels as ordered categories with Cat 1 < Cat 2 < Cat 3. In this case convert the character vector to an ordered factor with levels. This is done with the function factor().
ss <- factor(ss, order = TRUE)
class(ss)## [1] "ordered" "factor"
ss## [1] Cat 3 Cat 2 Cat 1 Cat 3 Cat 3
## Levels: Cat 1 < Cat 2 < Cat 3
The vector object is now an ordered factor. Printing the object results in a list of the elements in the vector and a list of the levels in order. Note: if you do the same for the wx object, the order is alphabetical by default. Try it.
Tuesday, September 6, 2022
Today
- Getting data into R
- Data frames
- Quantiles
- Pipes
More information about how to use RStudio and markdown files is available here: https://www.pipinghotdata.com/posts/2020-09-07-introducing-the-rstudio-ide-and-r-markdown/
Getting your data into R
You need to know two thing: (1) where the data are located, and (2) what type of data file is it.
Consider the file US.txt located in your project folder. It is in the same folder as this file (05-Lesson.Rmd). Click on the file name. It opens a file tab that shows a portion of the file.
It is a file with the column headings Year, All, MUS, G, FL, E. Each row is a year and the count is the number of hurricanes making landfall in the United States. All indicates anywhere in the continental U.S, MUS indicates at major hurricane intensity (at least 33 m/s). Each column is separated by a space.
To create a data object you use the readr::read_table() function. The only required argument is file =.
You put the name of the file in quotes. And set the header argument to TRUE since the first row in the file is not data.
LH.df <- readr::read_table(file = "data/US.txt")##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## All = col_double(),
## MUS = col_double(),
## G = col_double(),
## FL = col_double(),
## E = col_double()
## )
An data object called LH.df is now in your Environment under Data.
In this case the file name is simple because US.txt is in the same directory as your Rmd file.
Data files for an analysis are often kept somewhere else. Here for example note the folder called data? Click on the folder name. To read the data from that location you need to change file string name to "data/US.txt".
LH.df <- readr::read_table(file = "data/US.txt")##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## All = col_double(),
## MUS = col_double(),
## G = col_double(),
## FL = col_double(),
## E = col_double()
## )
The file = argument is where R looks for your data.
If you get an error message it is likely because the data file is not where you think it is.
Note: No changes are made to your original data file.
If there are missing values in the data file they should be coded as NA. If they are coded as something else then you specify the coding with the na = argument. For example, if the missing value character in our file is coded as 99, you specify na = "99".
The readr::read_csv() has settings that are suitable for comma delimited (csv) files that have been exported from a spreadsheet.
A work flow might include exporting data from a spreadsheet using the csv file format then importing it to R using the readr::read_csv() function.
You import data from the web by specifying the URL instead of the local file name.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/US.txt"
LH.df <- readr::read_table(file = loc)##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## All = col_double(),
## MUS = col_double(),
## G = col_double(),
## FL = col_double(),
## E = col_double()
## )
Recall that you reference the columns using the $ syntax. For example, type
LH.df$FL## [1] 1 2 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 0 2 1 0 1 2 1 0 3 0 2 0 0 0 3 1
## [38] 2 0 0 0 0 1 2 0 3 1 1 1 0 1 0 1 0 0 2 0 0 1 1 1 0 0 0 1 2 1 0 1 0 1 0 0 2
## [75] 1 2 0 2 1 0 0 0 2 2 2 1 0 0 1 0 1 2 0 1 2 1 2 2 1 2 0 0 1 0 0 1 0 0 0 1 0
## [112] 0 0 3 1 2 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 2 0 1 0 1 0 0 1 0 0 2 0 0 2
## [149] 1 0 0 0 0 4 3 0 0 0 0 0 0 0 0 0 0 1
sum(LH.df$FL)## [1] 110
The number of years with 0, 1, 2, … Florida hurricanes is obtained by typing
table(LH.df$FL)##
## 0 1 2 3 4
## 93 43 24 5 1
There are 93 years without a FL hurricane, 43 years with one hurricanes, 24 years with two hurricanes, and so on.
Creating structured data files
https://environmentalcomputing.net/getting-started-with-r/
Golden rules of data entry.
Convert unstructured data files (e.g., data stored in PDF forms) to structured data. https://www.youtube.com/watch?v=yBkHfIO8YJk
Data frames
The functions readr::read_table() and readr::read_csv() import data into our environment as a data frame. For example, LH.df is a data frame. You see the data object is a data frame in your Environment.
A data frame is like a spreadsheet. Values are arranged in rows and columns. Rows are the cases (observations) and columns are the variables.
The dim() function returns the size of the data frame in terms of how many rows (first number) and how many columns.
dim(LH.df)## [1] 166 6
There are 166 rows and 6 columns in the data frame.
Note the use of inline code. Open with a single back tick (grave accent) followed by the letter r and close with a single back tick. Inline code allows content in your report to be dynamic. There is no need to retype values when the data changes. Open 05-Lesson.html in a browser.
To list the first six lines of the data object, type
head(LH.df)## # A tibble: 6 × 6
## Year All MUS G FL E
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1851 1 1 0 1 0
## 2 1852 3 1 1 2 0
## 3 1853 0 0 0 0 0
## 4 1854 2 1 1 0 1
## 5 1855 1 1 1 0 0
## 6 1856 2 1 1 1 0
The columns include year, number of hurricanes, number of major hurricanes, number of Gulf coast hurricanes, number of Florida hurricanes, and number of East coast hurricanes in order. Column names are printed as well.
The last six lines of the data frame are listed similarly using the tail() function. The number of lines listed is changed using the argument n =.
tail(LH.df, n = 3)## # A tibble: 3 × 6
## Year All MUS G FL E
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2014 1 0 0 0 1
## 2 2015 0 0 0 0 0
## 3 2016 2 0 0 1 1
The number of years in the record is assigned to the object nY and the annual average number of hurricanes (rate) is assigned to the object rate.
nY <- length(LH.df$All)
rate <- mean(LH.df$All)By typing the names of the saved objects, the values are printed.
nY## [1] 166
rate## [1] 1.668675
Thus over the 166 years of data the average number of hurricanes per year is 1.67.
If you want to change the names of the columns in the data frame, type
names(LH.df)[4] <- "GC"
names(LH.df)## [1] "Year" "All" "MUS" "GC" "FL" "E"
This changes the 4th column name from G to GC. Note that this change occurs to the data frame in R and not to your original data file.
You will work almost exclusively with data frames. A data frame has rows and columns.
- Columns have names
- Columns are vectors
- Columns must be of the same length
- Columns must be of the same data type
Each element is indexed by a row number and a column number in that order and separated by a comma. So if df is a data frame then df[2, 3] is the second row of the third column.
To print the second row of the first column of the data frame LH.df you type
LH.df[2, 1]## # A tibble: 1 × 1
## Year
## <dbl>
## 1 1852
If you want all the values in a column, you leave the row number blank.
LH.df[ , 1]## # A tibble: 166 × 1
## Year
## <dbl>
## 1 1851
## 2 1852
## 3 1853
## 4 1854
## 5 1855
## 6 1856
## 7 1857
## 8 1858
## 9 1859
## 10 1860
## # … with 156 more rows
You can also reference the column by name LH.df$Year.
Data frames have two indexes indicating the rows and columns in that order.
LH.df[10, 4]## # A tibble: 1 × 1
## GC
## <dbl>
## 1 3
To a statistician a data frame is a table of observations. Each row contains one observation. Each observation must contain the same variables. These variables are called columns, and you can refer to them by name. You can also refer to the contents of the data frame by row number and column number (like a matrix).
To an Excel user a data frame is a worksheet (or a range within a worksheet). A data frame is more restrictive in that each column can only be of one data type (e.g., character, numeric, etc).
As an example, consider monthly precipitation from the state of Florida. Source: Monthly climate series. http://www.esrl.noaa.gov/psd/data/timeseries/. Get monthly precipitation values for the state back to the year 1895. Copy/paste into a text editor (notepad) then import using the readr::read_table() function.
Here I did it for Florida and put the file on my website. Missing values are coded as -9.900 so you add the argument na = "-9.900" to the function.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/FLprecip.txt"
FLp.df <- readr::read_table(loc, na = "-9.900")##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Jan = col_double(),
## Feb = col_double(),
## Mar = col_double(),
## Apr = col_double(),
## May = col_double(),
## Jun = col_double(),
## Jul = col_double(),
## Aug = col_double(),
## Sep = col_double(),
## Oct = col_double(),
## Nov = col_double(),
## Dec = col_double()
## )
Plot a time series graph.
library(ggplot2)
ggplot(data = FLp.df, aes(x = Year, y = Jan)) +
geom_line() +
ylab("Inches") +
ggtitle(label = "January Precipitation in Florida",
subtitle = "1895-2012")
A minimal, complete, reproducible example.
Quantiles
The median value cuts a set of ordered data values into two equal parts. Values larger than the median and values less than the median. The ordering comes from arranging the data from lowest to highest.
Quantiles cut a set of ordered data into arbitrary number of equal-sized parts. The quantile corresponding to cutting the data into two halves is called the median. Fifty percent of the data have values less than or equal to the median value. The median is the 50th percentile (.5 quantile).
Quantiles corresponding to cutting the ordered data into quarters are called quartiles. The lower (first) quartile cuts the data into the lower 25% and upper 75% of the data. The lower quartile is the .25 quantile or the 25th percentile indicating that 25% of the data have values less than this quantile value.
Correspondingly, the upper (third) quartile corresponding to the .75 quantile (75th percentile), indicates that 75% of the data have values less than this quantile value.
The quantile() function calculates quantiles on a vector of data. For example, consider Florida precipitation for the month of June. First apply the sort() function on the June values (column indicated by the label Jun).
sort(FLp.df$Jun)## [1] 2.303 2.445 3.292 3.643 3.673 3.898 3.908 4.089 4.202 4.401
## [11] 4.500 4.598 4.739 4.747 4.820 4.838 4.965 5.098 5.099 5.160
## [21] 5.182 5.221 5.321 5.349 5.362 5.422 5.440 5.531 5.588 5.602
## [31] 5.607 5.614 5.696 5.718 5.724 5.752 5.803 5.866 5.887 5.896
## [41] 5.931 5.971 5.998 6.142 6.147 6.171 6.220 6.258 6.269 6.281
## [51] 6.351 6.392 6.392 6.470 6.540 6.541 6.591 6.739 6.789 6.900
## [61] 6.991 6.998 7.002 7.009 7.012 7.049 7.057 7.098 7.118 7.208
## [71] 7.306 7.348 7.450 7.451 7.481 7.666 7.707 7.748 7.876 8.000
## [81] 8.040 8.158 8.168 8.243 8.317 8.378 8.389 8.432 8.488 8.578
## [91] 8.663 8.874 8.880 8.940 8.969 8.976 9.106 9.308 9.349 9.481
## [101] 9.734 9.865 9.939 9.993 10.032 10.276 10.280 10.288 10.309 10.360
## [111] 10.529 10.858 11.014 11.228 11.824 12.034 12.379
Again, note the use of the dollar sign to indicate the column in the data frame.
To find the 50th percentile you use the median() function directly or the quantile() function and specify the quantile with the probs = argument.
median(FLp.df$Jun)## [1] 6.789
quantile(FLp.df$Jun,
probs = .5)## 50%
## 6.789
To retrieve the 25th and 75th percentile values
quantile(FLp.df$Jun,
probs = c(.25, .75))## 25% 75%
## 5.602 8.432
Of the 117 monthly precipitation values, 25% of them are less than 5.6 inches, 50% are less than 6.79 inches.
Thus there are an equal number of years with June precipitation between 5.6 and 6.79 inches.
The difference between the first and third quartile values is called the interquartile range (IQR). Fifty percent of the data values lie within the IQR. The IQR is obtained using the IQR() function.
Another example: Consider the set of North Atlantic Oscillation (NAO) index values for the month of June from the period 1851–2010. The NAO is a variation in the climate over the North Atlantic Ocean featuring fluctuations in the difference of atmospheric pressure at sea level between the Iceland and the Azores.
The index is computed as the difference in standardized sea-level pressures. The standardization is done by subtracting the mean and dividing by the standard deviation. The index has units of standard deviation.
First read the data consisting of monthly NAO values, then list the column names and the first few data lines.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/NAO.txt"
NAO.df <- read.table(loc,
header = TRUE)
head(NAO.df)## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1 1851 3.29 1.03 1.50 -1.66 -1.53 -1.62 -5.39 4.68 1.85 0.78 -1.77 1.74
## 2 1852 1.46 0.41 -2.50 -1.60 0.25 0.09 -1.13 2.94 -2.02 -1.65 -0.93 1.03
## 3 1853 1.31 -4.04 -0.32 0.76 -3.17 1.09 1.76 -2.36 -0.22 -0.47 0.51 -4.28
## 4 1854 1.28 1.72 2.67 0.88 0.04 -0.06 -1.92 -0.03 2.62 1.11 -1.56 2.42
## 5 1855 -1.84 -3.80 -0.05 0.99 -2.28 0.78 -2.61 3.81 0.79 -1.09 -2.42 -1.66
## 6 1856 -1.25 -0.10 -2.27 2.00 -0.70 2.03 -0.16 -0.44 -0.50 1.12 -1.69 -0.23
Determine the 5th and 95th percentile values for the month of June.
quantile(NAO.df$Jun,
prob = c(.05, .95))## 5% 95%
## -2.808 1.891
The summary() function provides summary statistics for each column in your data frame. The statistics include output the mean, median, minimum, maximum, along with the first quartile and third quartile values.
summary(FLp.df)## Year Jan Feb Mar Apr
## Min. :1895 Min. :0.340 Min. :0.288 Min. :0.496 Min. :0.408
## 1st Qu.:1924 1st Qu.:1.798 1st Qu.:2.009 1st Qu.:2.142 1st Qu.:1.659
## Median :1953 Median :2.696 Median :3.099 Median :3.349 Median :2.677
## Mean :1953 Mean :2.916 Mean :3.164 Mean :3.663 Mean :2.926
## 3rd Qu.:1982 3rd Qu.:4.010 3rd Qu.:4.171 3rd Qu.:5.097 3rd Qu.:4.163
## Max. :2011 Max. :8.361 Max. :8.577 Max. :8.701 Max. :7.457
## May Jun Jul Aug
## Min. :0.900 Min. : 2.303 Min. : 4.050 Min. : 4.053
## 1st Qu.:2.483 1st Qu.: 5.602 1st Qu.: 6.427 1st Qu.: 6.164
## Median :3.758 Median : 6.789 Median : 7.522 Median : 7.102
## Mean :3.845 Mean : 7.046 Mean : 7.505 Mean : 7.345
## 3rd Qu.:4.765 3rd Qu.: 8.432 3rd Qu.: 8.358 3rd Qu.: 8.310
## Max. :9.848 Max. :12.379 Max. :11.263 Max. :13.090
## Sep Oct Nov Dec
## Min. : 2.126 Min. :0.471 Min. :0.370 Min. :0.610
## 1st Qu.: 4.930 1st Qu.:2.479 1st Qu.:1.370 1st Qu.:1.549
## Median : 6.680 Median :3.541 Median :2.139 Median :2.558
## Mean : 6.704 Mean :3.803 Mean :2.308 Mean :2.718
## 3rd Qu.: 7.955 3rd Qu.:4.899 3rd Qu.:3.110 3rd Qu.:3.521
## Max. :12.978 Max. :9.556 Max. :6.236 Max. :7.668
Columns with missing values get a row output from the summary() function indicating the number of them (NA’s).
Creating a data frame
The data.frame() function creates a data frame from a set of vectors.
Consider ice volume (10\(^3\) km\(^3\)) measurements from the arctic from 2002 to 2012. The measurements are taken on January 1st each year and are available from http://psc.apl.washington.edu/wordpress/research/projects/arctic-sea-ice-volume-anomaly/data/
Volume <- c(20.233, 19.659, 18.597, 18.948, 17.820,
16.736, 16.648, 17.068, 15.916, 14.455,
14.569)Since the data have a sequential order you create a data frame with year in the first column and volume in the second.
Year <- 2002:2012
Ice.df <- data.frame(Year, Volume)
head(Ice.df)## Year Volume
## 1 2002 20.233
## 2 2003 19.659
## 3 2004 18.597
## 4 2005 18.948
## 5 2006 17.820
## 6 2007 16.736
What year had the minimum ice volume?
which.min(Ice.df$Volume)## [1] 10
Ice.df[10, ]## Year Volume
## 10 2011 14.455
Ice.df$Year[which.min(Ice.df$Volume)]## [1] 2011
To change a vector to a data frame use the function as.data.frame(). For example, let counts be a vector of integers.
counts <- rpois(n = 100,
lambda = 1.66)
head(counts)## [1] 1 2 2 3 0 3
H.df <- as.data.frame(counts)
head(H.df)## counts
## 1 1
## 2 2
## 3 2
## 4 3
## 5 0
## 6 3
The column name in the data frame is the name of the vector.
Pipes
So far you have computed statistics on data stored as vectors (mean, median, quantiles, etc). But you often import data as data frames so you need to know how to manipulate them.
The {dplyr} package has functions (‘verbs’) that manipulate data frames in a friendly and logical way. Manipulations include, selecting columns, filtering rows, re-ordering rows, adding new columns, and summarizing data.
library(dplyr)Let’s look at these using the airquality data frame. Recall the object airquality is a data frame containing New York air quality measurements from May to September 1973. (?airquality).
head(airquality)## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
dim(airquality)## [1] 153 6
The columns include Ozone (ozone concentration in ppb), Solar.R (solar radiation in langleys), Wind (wind speed in mph), Temp (air temperature in degrees F), Month, and Day.
You summarize the values in each column with the summary() method.
summary(airquality)## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
Note that columns that have missing values are tabulated. For example, there are 37 missing ozone measurements and 7 missing radiation measurements.
Importantly you can apply the summary() function using the pipe operator (|> or %>%). The pipe operator is part of the {dplyr} package.
airquality |>
summary()## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
You read the pipe as THEN. “airquality data frame THEN summarize.”
The pipe operator allows us to string together a bunch of functions that makes it easy for humans to understand what was done. This is a key point. You want your code to be readable by a computer (correct syntax) but also readable to other humans.
For example, suppose the object of interest is called me and suppose there is a function called wake_up(). I could apply the function in two ways.
wake_up(me)
me |>
wake_up()The second way involves a bit more typing but it is easier for a human to read and thus it is easier to understand. This becomes clear when stringing together many functions.
For example, what happens to the result of me after the function wake_up() has been applied? How about get_out_of_bed() and the get_dressed()? Again, I can apply these functions in two ways.
get_dressed(get_out_of_bed(wake_up(me)))
me |>
wake_up() |>
get_out_of_bed() |>
get_dressed()Continuing
me |>
wake_up() |>
get_out_of_bed() |>
get_dressed() |>
make_coffee() |>
drink_coffee() |>
leave_house()Which is much better in terms of ‘readability’ then leave_house(drink_coffee(make_coffee(get_dressed(get_out_of_bed(wake_up(me)))))).
Consider again the FLp.df. How would you use the above syntax to compute the mean value of June precipitation?
You ask three questions: what function, applied to what variable, from what data frame? Answers: mean(), Jun, FLp.df. You then write the code starting with the answer to the last question first.
FLp.df |>
pull(Jun)## [1] 4.500 11.228 5.221 3.292 5.803 9.993 10.360 6.220 7.012 6.591
## [11] 5.160 8.040 6.392 6.351 6.739 10.288 4.820 12.379 5.531 4.202
## [21] 5.321 6.541 5.362 5.349 7.481 6.258 3.673 6.540 9.308 6.470
## [31] 6.281 8.168 7.450 7.057 8.158 10.858 2.303 8.378 5.182 9.865
## [41] 5.099 8.940 5.931 6.998 9.734 7.049 7.707 10.529 7.348 5.607
## [51] 8.578 7.098 9.106 3.908 8.000 4.089 4.747 3.643 7.876 5.588
## [61] 6.392 5.422 7.748 6.147 8.389 6.789 5.896 8.317 7.118 5.614
## [71] 10.032 8.880 8.488 9.939 6.142 5.866 5.602 8.432 5.887 10.276
## [81] 6.269 7.002 4.401 6.900 3.898 4.838 5.718 10.280 8.969 5.098
## [91] 7.009 7.451 5.696 4.739 8.976 5.724 7.666 12.034 4.598 9.349
## [101] 8.874 7.306 7.208 2.445 9.481 5.971 8.663 10.309 11.014 8.243
## [111] 11.824 5.752 5.998 6.991 6.171 5.440 4.965
The function pull() from the {dplyr} packages pulls out the column named Jun as a vector.
Then the mean() function takes these 118 values and computes the average.
FLp.df |>
pull(Jun) |>
mean()## [1] 7.045692
Note that the next function in the sequence receives the output from the previous function as its FIRST argument so the function mean() has nothing inside the parentheses.
Your turn
- Use the piping operator and compute the average wind speed in the
airqualitydata frame. - Use the piping operator and compute the 10th and 90th percentiles (lower and upper decile values) of May precipitation in Florida.
Thursday, September 8, 2022
Today
- Pipe operator
- Wrangling data
Data wrangling (munging) is the process of transforming data from one format into another to make it easier to interpret it.
The {dplyr} package includes functions that wrangle data frames in a logical way. Key idea: The functions operate on data frames and return data frames.
Operations include selecting columns, filtering rows, re-ordering rows, adding new columns, and summarizing data.
library(dplyr)Recall the object airquality is a data frame containing New York air quality measurements from May to September 1973. (?airquality).
You get a statistical summary of the values in each column with the summary() method.
summary(airquality)## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
Pipe operator
Importantly you can apply the summary() function using the pipe operator (|>). The pipe operator is part of the {dplyr} package and when used together with the wrangling functions, it provides a easy way to make code easy to read.
For example, you read the pipe as THEN. “airquality data frame THEN summarize.”
airquality |>
summary()## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
The pipe operator allows us to string together functions while keeping the code readable. You want your code to be machine readable (correct syntax) but also human readable.
For example, suppose the object of interest is called me and suppose there is a function called wake_up(). I can apply the function in two ways.
wake_up(me)
me |>
wake_up()The second way involves a bit more typing but it is easier for someone to read and thus it is easier to understand. This becomes clear when stringing together many functions.
For example, what happens to the result of me after the function wake_up() has been applied? How about get_out_of_bed() and then get_dressed()? I can apply these functions in two ways.
get_dressed(get_out_of_bed(wake_up(me)))
me |>
wake_up() |>
get_out_of_bed() |>
get_dressed()Continuing
me |>
wake_up() |>
get_out_of_bed() |>
get_dressed() |>
make_coffee() |>
drink_coffee() |>
leave_house()Which is much better in terms of ‘readability’ then leave_house(drink_coffee(make_coffee(get_dressed(get_out_of_bed(wake_up(me)))))).
Consider again the FLp.df.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/FLprecip.txt"
FLp.df <- read.table(loc,
header = TRUE,
na.string = "-9.900")How would you use the above readable syntax to compute the mean value of June precipitation?
You ask three questions: what function, applied to what variable, from what data frame? Answers: mean(), Jun, FLp.df. You then write the code starting with the answer to the last question first.
FLp.df |>
pull(Jun)## [1] 4.500 11.228 5.221 3.292 5.803 9.993 10.360 6.220 7.012 6.591
## [11] 5.160 8.040 6.392 6.351 6.739 10.288 4.820 12.379 5.531 4.202
## [21] 5.321 6.541 5.362 5.349 7.481 6.258 3.673 6.540 9.308 6.470
## [31] 6.281 8.168 7.450 7.057 8.158 10.858 2.303 8.378 5.182 9.865
## [41] 5.099 8.940 5.931 6.998 9.734 7.049 7.707 10.529 7.348 5.607
## [51] 8.578 7.098 9.106 3.908 8.000 4.089 4.747 3.643 7.876 5.588
## [61] 6.392 5.422 7.748 6.147 8.389 6.789 5.896 8.317 7.118 5.614
## [71] 10.032 8.880 8.488 9.939 6.142 5.866 5.602 8.432 5.887 10.276
## [81] 6.269 7.002 4.401 6.900 3.898 4.838 5.718 10.280 8.969 5.098
## [91] 7.009 7.451 5.696 4.739 8.976 5.724 7.666 12.034 4.598 9.349
## [101] 8.874 7.306 7.208 2.445 9.481 5.971 8.663 10.309 11.014 8.243
## [111] 11.824 5.752 5.998 6.991 6.171 5.440 4.965
The function pull() from the {dplyr} packages pulls out the column named Jun and returns a vector of the values.
Then the mean() function takes these 118 values and computes the average.
FLp.df |>
pull(Jun) |>
mean()## [1] 7.045692
IMPORTANT: the next function in the sequence receives the output from the previous function as its FIRST argument so the function mean() has nothing inside the parentheses.
- Use the piping operator and compute the average wind speed in the
airqualitydata frame.
airquality |>
pull(Wind) |>
mean()## [1] 9.957516
- Use the piping operator and compute the 10th and 90th percentiles (lower and upper decile values) of May precipitation in Florida.
FLp.df |>
pull(May) |>
quantile(probs = c(.1, .9))## 10% 90%
## 1.7954 6.0828
Wrangling data frames
You will wrangle data with functions from the {dplyr} package. The functions work on data frames but they work better if the data frame is a tibble. Tibbles are data frames that make life a little easier.
R is an old language, and some things that were useful 10 or 20 years ago now get in the way. To make a data frame a tibble (tabular data frame) type
airquality <- as_tibble(airquality)
class(airquality)## [1] "tbl_df" "tbl" "data.frame"
Click on airquality in the environment. It is a data frame.
Selecting and filtering
The function select() chooses variables by name to create a data frame with fewer columns. For example, choose the month, day, and temperature columns from the airquality data frame.
airquality |>
dplyr::select(Month, Day, Temp)## # A tibble: 153 × 3
## Month Day Temp
## <int> <int> <int>
## 1 5 1 67
## 2 5 2 72
## 3 5 3 74
## 4 5 4 62
## 5 5 5 56
## 6 5 6 66
## 7 5 7 65
## 8 5 8 59
## 9 5 9 61
## 10 5 10 69
## # … with 143 more rows
Suppose you want a new data frame with only the temperature and ozone concentrations.
df <- airquality |>
dplyr::select(Temp, Ozone)
df## # A tibble: 153 × 2
## Temp Ozone
## <int> <int>
## 1 67 41
## 2 72 36
## 3 74 12
## 4 62 18
## 5 56 NA
## 6 66 28
## 7 65 23
## 8 59 19
## 9 61 8
## 10 69 NA
## # … with 143 more rows
You include an assignment operator (<-, left pointing arrow) and an object name (here df).
Note: The result of applying most {dplyr} verbs is a data frame. The take only data frames and return only data frames.
The function filter() chooses observations based on specific values.

Suppose you want only the observations where the temperature is at or above 80F.
airquality |>
dplyr::filter(Temp >= 80)## # A tibble: 73 × 6
## Ozone Solar.R Wind Temp Month Day
## <int> <int> <dbl> <int> <int> <int>
## 1 45 252 14.9 81 5 29
## 2 NA 186 9.2 84 6 4
## 3 NA 220 8.6 85 6 5
## 4 29 127 9.7 82 6 7
## 5 NA 273 6.9 87 6 8
## 6 71 291 13.8 90 6 9
## 7 39 323 11.5 87 6 10
## 8 NA 259 10.9 93 6 11
## 9 NA 250 9.2 92 6 12
## 10 23 148 8 82 6 13
## # … with 63 more rows
The result is a data frame with the same 6 columns but now only 73 observations. Each of the observations has a temperature of at least 80F.
Suppose you want a new data frame keeping only observations where temperature is at least 80F AND winds less than 5 mph.
df <- airquality |>
dplyr::filter(Temp >= 80 & Wind < 5)
df## # A tibble: 8 × 6
## Ozone Solar.R Wind Temp Month Day
## <int> <int> <dbl> <int> <int> <int>
## 1 135 269 4.1 84 7 1
## 2 64 175 4.6 83 7 5
## 3 66 NA 4.6 87 8 6
## 4 122 255 4 89 8 7
## 5 168 238 3.4 81 8 25
## 6 118 225 2.3 94 8 29
## 7 73 183 2.8 93 9 3
## 8 91 189 4.6 93 9 4
Example: Palmer penguins
Let’s return to the penguins data set. The data set is located on the web, and you import it as a data frame using the readr::read_csv() function.
loc <- "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv"
penguins <- readr::read_csv(loc)## Rows: 344 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
penguins## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 2 more variables: sex <chr>, year <dbl>
To keep only the penguins labeled in the column sex as female type
penguins |>
dplyr::filter(sex == "female")## # A tibble: 165 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Adelie Torgersen 39.5 17.4 186 3800
## 2 Adelie Torgersen 40.3 18 195 3250
## 3 Adelie Torgersen 36.7 19.3 193 3450
## 4 Adelie Torgersen 38.9 17.8 181 3625
## 5 Adelie Torgersen 41.1 17.6 182 3200
## 6 Adelie Torgersen 36.6 17.8 185 3700
## 7 Adelie Torgersen 38.7 19 195 3450
## 8 Adelie Torgersen 34.4 18.4 184 3325
## 9 Adelie Biscoe 37.8 18.3 174 3400
## 10 Adelie Biscoe 35.9 19.2 189 3800
## # … with 155 more rows, and 2 more variables: sex <chr>, year <dbl>
To filter rows keeping only species that are not Adalie penguins.
penguins |>
dplyr::filter(species != "Adelie")## # A tibble: 192 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Gentoo Biscoe 46.1 13.2 211 4500
## 2 Gentoo Biscoe 50 16.3 230 5700
## 3 Gentoo Biscoe 48.7 14.1 210 4450
## 4 Gentoo Biscoe 50 15.2 218 5700
## 5 Gentoo Biscoe 47.6 14.5 215 5400
## 6 Gentoo Biscoe 46.5 13.5 210 4550
## 7 Gentoo Biscoe 45.4 14.6 211 4800
## 8 Gentoo Biscoe 46.7 15.3 219 5200
## 9 Gentoo Biscoe 43.3 13.4 209 4400
## 10 Gentoo Biscoe 46.8 15.4 215 5150
## # … with 182 more rows, and 2 more variables: sex <chr>, year <dbl>
When the column of interest is a numerical, you can filter rows by using greater than condition. For example, to create a data frame containing the heaviest penguins you filter keeping only rows with body mass greater than 6000 g.
penguins |>
dplyr::filter(body_mass_g > 6000)## # A tibble: 2 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Gentoo Biscoe 49.2 15.2 221 6300 male
## 2 Gentoo Biscoe 59.6 17 230 6050 male
## # … with 1 more variable: year <dbl>
You can also filter rows of a data frame with less than condition. For example, to create a data frame containing only penguins with short flippers you filter keeping only rows with flipper length less than 175 mm.
penguins |>
dplyr::filter(flipper_length_mm < 175)## # A tibble: 2 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Adelie Biscoe 37.8 18.3 174 3400 fema…
## 2 Adelie Biscoe 37.9 18.6 172 3150 fema…
## # … with 1 more variable: year <dbl>
You can also specify more than one conditions. For example to create a data frame with female penguins that have larger flippers you filter keeping only rows with flipper length greater than 220 mm and with sex equal to female.
penguins |>
dplyr::filter(flipper_length_mm > 220 &
sex == "female")## # A tibble: 1 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Gentoo Biscoe 46.9 14.6 222 4875 fema…
## # … with 1 more variable: year <dbl>
You can also filter a data frame for rows satisfying one of the two conditions using OR. For example to create a data frame with penguins have large flippers or short bills you filter keeping rows with flipper length of at least 220 mm or with bill depth less than 10 mm.
penguins |>
dplyr::filter(flipper_length_mm > 220 |
bill_depth_mm < 10)## # A tibble: 35 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Gentoo Biscoe 50 16.3 230 5700
## 2 Gentoo Biscoe 49.2 15.2 221 6300
## 3 Gentoo Biscoe 48.7 15.1 222 5350
## 4 Gentoo Biscoe 47.3 15.3 222 5250
## 5 Gentoo Biscoe 59.6 17 230 6050
## 6 Gentoo Biscoe 49.6 16 225 5700
## 7 Gentoo Biscoe 50.5 15.9 222 5550
## 8 Gentoo Biscoe 50.5 15.9 225 5400
## 9 Gentoo Biscoe 50.1 15 225 5000
## 10 Gentoo Biscoe 50.4 15.3 224 5550
## # … with 25 more rows, and 2 more variables: sex <chr>, year <dbl>
Often you want to remove rows if one of the columns has a missing value. With is.na() on the column of interest, you can filter rows based on whether or not a column value is missing.
Note the is.na() function returns a vector of TRUEs and FALSEs
is.na(airquality$Ozone)## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
## [37] TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
## [49] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [73] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
The first four rows of the vector Ozone in the airquality data frame are not missing so the function is.na() returns four FALSEs.
When you combine that with the filter() function you get a data frame containing all the rows where is.na() returns a TRUE. For example, create a data frame containing rows where the bill length value is missing.
penguins |>
dplyr::filter(is.na(bill_length_mm))## # A tibble: 2 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Adelie Torge… NA NA NA NA <NA>
## 2 Gentoo Biscoe NA NA NA NA <NA>
## # … with 1 more variable: year <dbl>
Usually you will want to do the reverse of this. That is keep all the rows where the column value is not missing. In this case use negation symbol ! to reverse the selection. In this example, filter rows with no missing values for sex column.
penguins |>
dplyr::filter(!is.na(sex))## # A tibble: 333 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen 36.7 19.3 193 3450
## 5 Adelie Torgersen 39.3 20.6 190 3650
## 6 Adelie Torgersen 38.9 17.8 181 3625
## 7 Adelie Torgersen 39.2 19.6 195 4675
## 8 Adelie Torgersen 41.1 17.6 182 3200
## 9 Adelie Torgersen 38.6 21.2 191 3800
## 10 Adelie Torgersen 34.6 21.1 198 4400
## # … with 323 more rows, and 2 more variables: sex <chr>, year <dbl>
Note that this filtering will keep rows with other column values that are missing values but there will be no penguins where the sex value is NA.
Stringing functions together
The function arrange() orders the rows by values given in a particular column.
airquality |>
dplyr::arrange(Solar.R)## # A tibble: 153 × 6
## Ozone Solar.R Wind Temp Month Day
## <int> <int> <dbl> <int> <int> <int>
## 1 16 7 6.9 74 7 21
## 2 1 8 9.7 59 5 21
## 3 23 13 12 67 5 28
## 4 23 14 9.2 71 9 22
## 5 8 19 20.1 61 5 9
## 6 14 20 16.6 63 9 25
## 7 9 24 13.8 81 8 2
## 8 9 24 10.9 71 9 14
## 9 4 25 9.7 61 5 23
## 10 13 27 10.3 76 9 18
## # … with 143 more rows
The ordering is from lowest value to highest value. Here the first 10 rows. Note Month and Day are no longer chronological.
Repeat but order by the value of air temperature.
airquality |>
dplyr::arrange(Temp)## # A tibble: 153 × 6
## Ozone Solar.R Wind Temp Month Day
## <int> <int> <dbl> <int> <int> <int>
## 1 NA NA 14.3 56 5 5
## 2 6 78 18.4 57 5 18
## 3 NA 66 16.6 57 5 25
## 4 NA NA 8 57 5 27
## 5 18 65 13.2 58 5 15
## 6 NA 266 14.9 58 5 26
## 7 19 99 13.8 59 5 8
## 8 1 8 9.7 59 5 21
## 9 8 19 20.1 61 5 9
## 10 4 25 9.7 61 5 23
## # … with 143 more rows
Importantly you can string the functions together. For example select the variables radiation, wind, and temperature then filter by temperatures above 90F and arrange from coolest to warmest by temperature.
airquality |>
dplyr::select(Solar.R, Wind, Temp) |>
dplyr::filter(Temp > 90) |>
dplyr::arrange(Temp)## # A tibble: 14 × 3
## Solar.R Wind Temp
## <int> <dbl> <int>
## 1 291 14.9 91
## 2 167 6.9 91
## 3 250 9.2 92
## 4 267 6.3 92
## 5 272 5.7 92
## 6 222 8.6 92
## 7 197 5.1 92
## 8 259 10.9 93
## 9 183 2.8 93
## 10 189 4.6 93
## 11 225 2.3 94
## 12 188 6.3 94
## 13 237 6.3 96
## 14 203 9.7 97
The result is a data frame with three columns and 14 rows arranged by increasing temperatures above 90F.
The mutate() function adds new columns to the data frame.

For example, create a new column called TempC as the temperature in degrees Celcius. Also create a column called WindMS as the wind speed in meters per second.
airquality |>
dplyr::mutate(TempC = (Temp - 32) * 5/9,
WindMS = Wind * .44704) ## # A tibble: 153 × 8
## Ozone Solar.R Wind Temp Month Day TempC WindMS
## <int> <int> <dbl> <int> <int> <int> <dbl> <dbl>
## 1 41 190 7.4 67 5 1 19.4 3.31
## 2 36 118 8 72 5 2 22.2 3.58
## 3 12 149 12.6 74 5 3 23.3 5.63
## 4 18 313 11.5 62 5 4 16.7 5.14
## 5 NA NA 14.3 56 5 5 13.3 6.39
## 6 28 NA 14.9 66 5 6 18.9 6.66
## 7 23 299 8.6 65 5 7 18.3 3.84
## 8 19 99 13.8 59 5 8 15 6.17
## 9 8 19 20.1 61 5 9 16.1 8.99
## 10 NA 194 8.6 69 5 10 20.6 3.84
## # … with 143 more rows
The resulting data frame has 8 columns (two new ones) labeled TempC and WindMS.
On days when the temperature is below 60 F add a column giving the apparent temperature based on the cooling effect of the wind (wind chill) and then arrange from coldest to warmest apparent temperature.
airquality |>
dplyr::filter(Temp < 60) |>
dplyr::mutate(TempAp = 35.74 + .6215 * Temp - 35.75 * Wind^.16 + .4275 * Temp * Wind^.16) |>
dplyr::arrange(TempAp)## # A tibble: 8 × 7
## Ozone Solar.R Wind Temp Month Day TempAp
## <int> <int> <dbl> <int> <int> <int> <dbl>
## 1 NA NA 14.3 56 5 5 52.5
## 2 6 78 18.4 57 5 18 53.0
## 3 NA 66 16.6 57 5 25 53.3
## 4 NA 266 14.9 58 5 26 54.9
## 5 18 65 13.2 58 5 15 55.2
## 6 NA NA 8 57 5 27 55.3
## 7 19 99 13.8 59 5 8 56.4
## 8 1 8 9.7 59 5 21 57.3
Summarize
The summarize() function reduces (flattens) the data frame based on a function that computes a statistic. For example, to compute the average wind speed during July type
airquality |>
dplyr::filter(Month == 7) |>
dplyr::summarize(Wavg = mean(Wind))## # A tibble: 1 × 1
## Wavg
## <dbl>
## 1 8.94
airquality |>
dplyr::filter(Month == 6) |>
dplyr::summarize(Tavg = mean(Temp))## # A tibble: 1 × 1
## Tavg
## <dbl>
## 1 79.1
We have seen functions that compute statistics on vectors including sum(), sd(), min(), max(), var(), range(), median(). Others include
| Summary function | Description |
|---|---|
dplyr::n() |
Length of the column |
dplyr::first() |
First value of the column |
dplyr::last() |
Last value of the column |
dplyr::n_distinct() |
Number of distinct values |
Find the maximum and median wind speed and maximum ozone concentration values during the month of May. Also determine the number of observations during May.
airquality |>
dplyr::filter(Month == 5) |>
dplyr::summarize(Wmax = max(Wind),
Wmed = median(Wind),
OzoneMax = max(Ozone, na.rm = TRUE),
NumDays = dplyr::n())## # A tibble: 1 × 4
## Wmax Wmed OzoneMax NumDays
## <dbl> <dbl> <int> <int>
## 1 20.1 11.5 115 31
Why do you get an NA for OzoneMax?
Fix this by including the argument na.rm = TRUE inside the max() function.
airquality |>
dplyr::filter(Month == 5) |>
dplyr::summarize(Wmax = max(Wind),
Wmed = median(Wind),
OzoneMax = max(Ozone, na.rm = TRUE),
NumDays = dplyr::n())## # A tibble: 1 × 4
## Wmax Wmed OzoneMax NumDays
## <dbl> <dbl> <int> <int>
## 1 20.1 11.5 115 31
Grouping
If you want to summarize separately for each month you use the group_by() function. You split the data frame by some variable (e.g., Month), apply a function to the individual data frames, and then combine the output.
Find the highest ozone concentration by month. Include the number of observations (days) in the month.
airquality |>
dplyr::group_by(Month) |>
dplyr::summarize(OzoneMax = max(Ozone, na.rm = TRUE),
NumDays = dplyr::n())## # A tibble: 5 × 3
## Month OzoneMax NumDays
## <int> <int> <int>
## 1 5 115 31
## 2 6 71 30
## 3 7 135 31
## 4 8 168 31
## 5 9 96 30
Find the average ozone concentration when temperatures are above and below 70 F. Include the number of observations (days) in the two groups.
airquality |>
dplyr::group_by(Temp >= 70) |>
dplyr::summarize(OzoneAvg = mean(Ozone, na.rm = TRUE),
NumDays = dplyr::n())## # A tibble: 2 × 3
## `Temp >= 70` OzoneAvg NumDays
## <lgl> <dbl> <int>
## 1 FALSE 18.0 32
## 2 TRUE 49.1 121
On average ozone concentration is higher on warm days (Temp >= 70 F) days. Said another way; mean ozone concentration statistically depends on temperature.
The mean is a model for the data. The statistical dependency of the mean implies that a model for ozone concentration will likely be improved by including temperature as an explanatory variable.
To summarize, the important verbs are
| Verb | Description |
|---|---|
dplyr::select() |
selects columns; pick variables by their names |
dplyr::filter() |
filters rows; pick observations by their values |
dplyr::mutate() |
creates new columns; create new variables with functions of existing variables |
dplyr::summarize() |
summarizes values; collapse many values down to a single summary |
dplyr::group_by() |
allows operations to be grouped |
The syntax of the verb functions are all the same:
Properties
* The first argument is a data frame. This argument is implicit when using the |> operator.
* The subsequent arguments describe what to do with the data frame. You refer to columns in the data frame directly (without using $).
* The result is a new data frame
These properties make it easy to chain together many simple lines of code to do something complex.
The five functions form the basis of a grammar for data. At the most basic level, you can only alter a data frame in five useful ways: you can reorder the rows (arrange()), pick observations and variables of interest (filter() and select()), add new variables that are functions of existing variables (mutate()), or collapse many values to a summary (summarise()).
Your turn
Consider again the Florida precipitation data set (http://myweb.fsu.edu/jelsner/temp/data/FLprecip.txt). Import the data as a data frame, select the columns April and Year, group by years > 1960, then compute the mean and variance of the April rainfall with the summarize() function.
Tuesday, September 12, 2022
Today
- Examples of data munging with functions from the {dplyr} package
You work with data frames. The functions are verbs. The verbs include:
| Verb | Description |
|---|---|
dplyr::select() |
selects columns; pick variables by their names |
dplyr::filter() |
filters rows; pick observations by their values |
dplyr::arrange() |
reorders rows |
dplyr::mutate() |
creates new columns; create new variables with functions of existing variables |
dplyr::summarize() |
summarizes values; collapse many values down to a single summary |
dplyr::group_by() |
allows operations to be grouped |
Syntax for the verb functions are the same:
Properties
* The first argument is a data frame. This argument is implied when using the |> (pipe) operator (also %>%).
* The subsequent arguments describe what to do with the data frame. You refer to columns in the data frame directly (without using $).
* The result is a new data frame
The properties make it easy to chain together simple lines of code to do something complex.
The five functions form the basis of a grammar for data. At the most basic level, you can alter a data frame in five useful ways: you can reorder the rows (arrange()), pick observations and variables of interest (filter() and select()), add new variables that are functions of existing variables (mutate()), or collapse many values to a summary (summarise()).
As a review consider again the Florida precipitation data set (http://myweb.fsu.edu/jelsner/temp/data/FLprecip.txt). Import the data as a data frame, select the columns April and Year, group by years > 1960, then summarize by computing the mean and variance of the April rainfall.
FLp.df <- readr::read_table(file = "http://myweb.fsu.edu/jelsner/temp/data/FLprecip.txt")##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Jan = col_double(),
## Feb = col_double(),
## Mar = col_double(),
## Apr = col_double(),
## May = col_double(),
## Jun = col_double(),
## Jul = col_double(),
## Aug = col_double(),
## Sep = col_double(),
## Oct = col_double(),
## Nov = col_double(),
## Dec = col_double()
## )
FLp.df |>
dplyr::select(Apr, Year) |>
dplyr::group_by(Year > 1960) |>
dplyr::summarize(Avg = mean(Apr),
Var = var(Apr))## # A tibble: 2 × 3
## `Year > 1960` Avg Var
## <lgl> <dbl> <dbl>
## 1 FALSE 3.14 2.61
## 2 TRUE 2.66 2.07
Example 1: New York City flight data
Let’s consider the flights data frame from the package {nycflights13}.
library(nycflights13)
dim(flights)## [1] 336776 19
The data contains all 336,776 flights that departed NYC in 2013 and comes from the U.S. Bureau of Transportation Statistics. More information is available by typing ?nycflights13.
The object flights is a tibble (tabled data frame). When we have a large data frame it is useful to make it a tibble.
head(flights)## # A tibble: 6 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## # … with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
The function filter() selects a set of rows in a data frame. How would you select all flights occurring on February 1st?
flights |>
dplyr::filter(month == 2 &
day == 1)## # A tibble: 926 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 2 1 456 500 -4 652 648
## 2 2013 2 1 520 525 -5 816 820
## 3 2013 2 1 527 530 -3 837 829
## 4 2013 2 1 532 540 -8 1007 1017
## 5 2013 2 1 540 540 0 859 850
## 6 2013 2 1 552 600 -8 714 715
## 7 2013 2 1 552 600 -8 919 910
## 8 2013 2 1 552 600 -8 655 709
## 9 2013 2 1 553 600 -7 833 815
## 10 2013 2 1 553 600 -7 821 825
## # … with 916 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
## # flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
## # distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
The function arrange() reorders the rows. If you provide more than one column name as arguments, each additional column is used to break ties in the values of the preceding columns.
How would you arrange all flights in descending order of departure delay?
flights |>
dplyr::arrange(desc(dep_delay))## # A tibble: 336,776 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 9 641 900 1301 1242 1530
## 2 2013 6 15 1432 1935 1137 1607 2120
## 3 2013 1 10 1121 1635 1126 1239 1810
## 4 2013 9 20 1139 1845 1014 1457 2210
## 5 2013 7 22 845 1600 1005 1044 1815
## 6 2013 4 10 1100 1900 960 1342 2211
## 7 2013 3 17 2321 810 911 135 1020
## 8 2013 6 27 959 1900 899 1236 2226
## 9 2013 7 22 2257 759 898 121 1026
## 10 2013 12 5 756 1700 896 1058 2020
## # … with 336,766 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Often you work with large data sets with many columns but only a few are of interest. The function select() allows us to zoom in on an interesting subset of the columns.
How would you create a data frame containing only the dates, carrier, and flight numbers?
df <- flights |>
dplyr::select(year:day, carrier, flight)
df## # A tibble: 336,776 × 5
## year month day carrier flight
## <int> <int> <int> <chr> <int>
## 1 2013 1 1 UA 1545
## 2 2013 1 1 UA 1714
## 3 2013 1 1 AA 1141
## 4 2013 1 1 B6 725
## 5 2013 1 1 DL 461
## 6 2013 1 1 UA 1696
## 7 2013 1 1 B6 507
## 8 2013 1 1 EV 5708
## 9 2013 1 1 B6 79
## 10 2013 1 1 AA 301
## # … with 336,766 more rows
Note here the sequence operator : to get all columns between the column labeled year and the column labeled day.
How many distinct carriers are there?
df |>
dplyr::distinct(carrier) |>
nrow()## [1] 16
You include new columns with the function mutate(). Compute the time gained during flight by subtracting the departure delay (minutes) from the arrival delay.
flights |>
dplyr::mutate(gain = arr_delay - dep_delay) |>
dplyr::select(year:day, carrier, flight, gain) |>
dplyr::arrange(desc(gain))## # A tibble: 336,776 × 6
## year month day carrier flight gain
## <int> <int> <int> <chr> <int> <dbl>
## 1 2013 11 1 VX 399 196
## 2 2013 4 18 AA 707 181
## 3 2013 8 8 UA 996 165
## 4 2013 7 10 DL 1465 161
## 5 2013 6 27 MQ 3199 157
## 6 2013 7 22 DL 1619 154
## 7 2013 7 1 DL 2395 153
## 8 2013 7 10 EV 4580 150
## 9 2013 7 22 MQ 2793 150
## 10 2013 4 18 AA 2083 148
## # … with 336,766 more rows
Determine the average departure delay.
flights |>
dplyr::summarize(avgDelay = mean(dep_delay, na.rm = TRUE))## # A tibble: 1 × 1
## avgDelay
## <dbl>
## 1 12.6
Note that if there are missing values in a vector the function mean() needs to include the argument na.rm = TRUE otherwise the output will be NA.
y <- c(5, 6, 7, NA)
mean(y)## [1] NA
mean(y, na.rm = TRUE)## [1] 6
You use sample_n() and sample_frac() to take random sample of rows from the data frame. Take a random sample of five rows from the flights data frame.
flights |>
dplyr::sample_n(5)## # A tibble: 5 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 11 14 1547 1550 -3 1733 1745
## 2 2013 9 11 1229 1238 -9 1319 1354
## 3 2013 7 31 1451 1452 -1 1725 1747
## 4 2013 1 10 1145 1145 0 1322 1321
## 5 2013 4 27 941 950 -9 1230 1252
## # … with 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
Take a random sample of 1% of the rows.
flights |>
dplyr::sample_frac(.01)## # A tibble: 3,368 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 11 27 1854 1900 -6 2132 2131
## 2 2013 1 25 552 600 -8 644 709
## 3 2013 1 9 658 700 -2 834 839
## 4 2013 4 19 1805 1800 5 1914 1919
## 5 2013 4 23 1600 1545 15 1805 1745
## 6 2013 6 14 1708 1715 -7 1820 1829
## 7 2013 7 27 2358 2359 -1 336 344
## 8 2013 9 10 1512 1453 19 1750 1811
## 9 2013 3 30 1901 1905 -4 2039 2114
## 10 2013 3 15 1910 1905 5 2011 2028
## # … with 3,358 more rows, and 11 more variables: arr_delay <dbl>,
## # carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
## # air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Use the argument replace = TRUE to perform a bootstrap sample. More on this later.
Random samples are important to modern data science.
The verbs are powerful when you apply them to groups of observations within a data frame. This is done with the function group_by(). Determine the average arrival delay by airplane (tail number).
flights |>
dplyr::group_by(tailnum) |>
dplyr::summarize(delayAvg = mean(arr_delay, na.rm = TRUE)) |>
dplyr::arrange(desc(delayAvg))## # A tibble: 4,044 × 2
## tailnum delayAvg
## <chr> <dbl>
## 1 N844MH 320
## 2 N911DA 294
## 3 N922EV 276
## 4 N587NW 264
## 5 N851NW 219
## 6 N928DN 201
## 7 N7715E 188
## 8 N654UA 185
## 9 N665MQ 175.
## 10 N427SW 157
## # … with 4,034 more rows
Determine the number of distinct planes and flights by destination location.
flights |>
dplyr::group_by(dest) |>
dplyr::summarize(planes = dplyr::n_distinct(tailnum),
flights = dplyr::n())## # A tibble: 105 × 3
## dest planes flights
## <chr> <int> <int>
## 1 ABQ 108 254
## 2 ACK 58 265
## 3 ALB 172 439
## 4 ANC 6 8
## 5 ATL 1180 17215
## 6 AUS 993 2439
## 7 AVL 159 275
## 8 BDL 186 443
## 9 BGR 46 375
## 10 BHM 45 297
## # … with 95 more rows
Repeat but arrange from most to fewest planes.
Example 2: Daily weather data from Tallahassee
Let’s consider another set of data. Daily high and low temperatures and precipitation in Tallahassee.
The file (TLH_SOD1892.csv) is available in this project in the folder data).
Import the data as a data frame.
TLH.df <- readr::read_csv(file = "data/TLH_SOD1892.csv")## Rows: 47056 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): STATION, NAME
## dbl (10): LATITUDE, LONGITUDE, ELEVATION, PRCP, TAVG, TMAX, TMIN, WDF1, WSF...
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The data frame contains daily high (TMAX) and low (TMIN) temperatures and total precipitation (PRCP) from two stations: Airport with STATION identification USW00093805 and downtown with STATION identification USC00088754.
Use the select() function to create a new data frame with only STATION, DATE, PRCP, TMAX and TMIN.
TLH.df <- TLH.df |>
dplyr::select(STATION, DATE, PRCP, TMAX, TMIN)
TLH.df## # A tibble: 47,056 × 5
## STATION DATE PRCP TMAX TMIN
## <chr> <date> <dbl> <dbl> <dbl>
## 1 USW00093805 1940-03-01 0 72 56
## 2 USW00093805 1940-03-02 0 77 53
## 3 USW00093805 1940-03-03 0.05 73 56
## 4 USW00093805 1940-03-04 0 72 44
## 5 USW00093805 1940-03-05 0 61 45
## 6 USW00093805 1940-03-06 0 66 40
## 7 USW00093805 1940-03-07 0 72 36
## 8 USW00093805 1940-03-08 0 56 41
## 9 USW00093805 1940-03-09 0 60 33
## 10 USW00093805 1940-03-10 0 72 32
## # … with 47,046 more rows
Note that you’ve recycled the name of the data frame. You started with TLH.df containing all the columns and we ended with TLH.df with only the columns selected.
Then use the filter() function to keep only days at or above 90F. Similarly you recycle the name of the data frame. Use the glimpse() function to take a look at the resulting data frame.
TLH.df <- TLH.df |>
dplyr::filter(TMAX >= 90) |>
dplyr::glimpse()## Rows: 10,632
## Columns: 5
## $ STATION <chr> "USW00093805", "USW00093805", "USW00093805", "USW00093805", "U…
## $ DATE <date> 1940-05-18, 1940-05-20, 1940-05-21, 1940-05-22, 1940-05-23, 1…
## $ PRCP <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.45, 0.00, 0.…
## $ TMAX <dbl> 91, 92, 94, 93, 93, 90, 90, 91, 91, 91, 92, 95, 95, 95, 93, 91…
## $ TMIN <dbl> 53, 60, 67, 64, 71, 60, 58, 62, 68, 73, 71, 72, 72, 70, 72, 70…
Note that the DATE column is a vector of dates having class date. Note if this were a character string you convert the character string into a date with the as.Date() function.
Functions from the {lubridate} package are used to extract information from dates. Here you add columns labeled Year, Month, and Day using the extractor functions year(), month(), etc.
library(lubridate)##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
TLH.df <- TLH.df |>
dplyr::mutate(Year = year(DATE),
Month = month(DATE),
Day = day(DATE),
DoW = weekdays(DATE))
TLH.df## # A tibble: 10,632 × 9
## STATION DATE PRCP TMAX TMIN Year Month Day DoW
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr>
## 1 USW00093805 1940-05-18 0 91 53 1940 5 18 Saturday
## 2 USW00093805 1940-05-20 0 92 60 1940 5 20 Monday
## 3 USW00093805 1940-05-21 0 94 67 1940 5 21 Tuesday
## 4 USW00093805 1940-05-22 0 93 64 1940 5 22 Wednesday
## 5 USW00093805 1940-05-23 0 93 71 1940 5 23 Thursday
## 6 USW00093805 1940-05-27 0 90 60 1940 5 27 Monday
## 7 USW00093805 1940-05-28 0 90 58 1940 5 28 Tuesday
## 8 USW00093805 1940-06-02 0 91 62 1940 6 2 Sunday
## 9 USW00093805 1940-06-14 0.45 91 68 1940 6 14 Friday
## 10 USW00093805 1940-06-17 0 91 73 1940 6 17 Monday
## # … with 10,622 more rows
Next you keep only the temperature record from the airport. You use the filter() function on the column labeled STATION.
TLH.df <- TLH.df |>
dplyr::filter(STATION == "USW00093805")Now what if you want to know how many hot days (90F or higher) by year? You use the group_by() function and count using the n() function.
TLH90.df <- TLH.df |>
dplyr::group_by(Year) |>
dplyr::summarize(nHotDays = dplyr::n())
TLH90.df## # A tibble: 79 × 2
## Year nHotDays
## <dbl> <int>
## 1 1940 63
## 2 1941 96
## 3 1942 75
## 4 1943 101
## 5 1944 95
## 6 1945 83
## 7 1946 71
## 8 1947 94
## 9 1948 97
## 10 1949 70
## # … with 69 more rows
Note that the group_by() function results in a data frame with the first column the variable used inside the function. In this case it is Year. The next columns are defined by what is in the summarize() function.
Repeat but this time group by Month.
TLH.df |>
dplyr::group_by(Month) |>
dplyr::summarize(nHotDays = dplyr::n())## # A tibble: 8 × 2
## Month nHotDays
## <dbl> <int>
## 1 3 2
## 2 4 102
## 3 5 778
## 4 6 1523
## 5 7 1794
## 6 8 1746
## 7 9 1119
## 8 10 157
As expected the number of 90F+ days is highest in July and August. Note that you’ve had 90F+ days in October.
Would you expect there to be more hot days on the weekend? How would you check this?
TLH.df |>
dplyr::group_by(Year, DoW) |>
dplyr::summarize(nHotDays = dplyr::n())## `summarise()` has grouped output by 'Year'. You can override using the `.groups`
## argument.
## # A tibble: 553 × 3
## # Groups: Year [79]
## Year DoW nHotDays
## <dbl> <chr> <int>
## 1 1940 Friday 10
## 2 1940 Monday 10
## 3 1940 Saturday 7
## 4 1940 Sunday 8
## 5 1940 Thursday 9
## 6 1940 Tuesday 11
## 7 1940 Wednesday 8
## 8 1941 Friday 17
## 9 1941 Monday 12
## 10 1941 Saturday 13
## # … with 543 more rows
You can group by more than one variable. For example, add the variable Year to the group_by() function above.
Recall that you can also arrange() the data frame ordered according to the values in a particular column.
TLH90.df |>
dplyr::arrange(desc(nHotDays))## # A tibble: 79 × 2
## Year nHotDays
## <dbl> <int>
## 1 2016 134
## 2 1990 129
## 3 2011 125
## 4 1993 119
## 5 2010 118
## 6 2015 118
## 7 2018 118
## 8 1986 116
## 9 2007 116
## 10 2000 115
## # … with 69 more rows
Putting everything together
Let’s put together your first piece of original research. You know how to import a data file, you know how to manipulate the data frame to compute something of interest, and you know how to make a graph.
Let’s do this for the number of hot days. Let’s say you want a plot of the annual number of hot days in Tallahassee since 1950. Let’s define a hot day as one where the high temperature is at least 90F.
library(ggplot2)
readr::read_csv(file = "data/TLH_SOD1892.csv") |>
dplyr::filter(STATION == "USW00093805",
TMAX >= 90) |>
dplyr::mutate(Year = year(DATE)) |>
dplyr::filter(Year >= 1950) |>
dplyr::group_by(Year) |>
dplyr::summarize(nHotDays = dplyr::n()) |>
ggplot(aes(x = Year, y = nHotDays)) +
geom_point() +
geom_smooth() +
scale_y_continuous(limits = c(0, NA)) +
ylab("Number of Days") +
ggtitle("Number of Hot Days in Tallahassee Since 1950",
subtitle = "High Temperature >= 90F") +
theme_minimal()## Rows: 47056 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): STATION, NAME
## dbl (10): LATITUDE, LONGITUDE, ELEVATION, PRCP, TAVG, TMAX, TMIN, WDF1, WSF...
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

You go from data in a file to a plot of interest with a set of functions that are logically ordered and easy to read.
What would you change to make a similar plot for the number of hot nights (say where the minimum temperature fails to drop below 74)?
readr::read_csv(file = "data/TLH_SOD1892.csv") |>
dplyr::filter(STATION == "USW00093805",
TMIN >= 74) |>
dplyr::mutate(Year = year(DATE)) |>
dplyr::filter(Year >= 1950) |>
dplyr::group_by(Year) |>
dplyr::summarize(nHotNights = dplyr::n()) |>
ggplot(aes(x = Year, y = nHotNights)) +
geom_point() +
geom_smooth() +
scale_y_continuous(limits = c(0, NA)) +
ylab("Number of Nights") +
ggtitle("Number of Hot Nights in Tallahassee Since 1950",
subtitle = "Low Temperature >= 74F") +
theme_minimal()## Rows: 47056 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): STATION, NAME
## dbl (10): LATITUDE, LONGITUDE, ELEVATION, PRCP, TAVG, TMAX, TMIN, WDF1, WSF...
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Make a similar plot showing the total precipitation by year.
readr::read_csv(file = "data/TLH_SOD1892.csv") |>
dplyr::filter(STATION == "USW00093805") |>
dplyr::mutate(Year = year(DATE)) |>
dplyr::filter(Year >= 1950) |>
dplyr::group_by(Year) |>
dplyr::summarize(TotalPrecip = sum(PRCP)) |>
ggplot(aes(x = Year, y = TotalPrecip)) +
geom_point() +
geom_smooth() +
scale_y_continuous(limits = c(0, NA)) +
ylab("Total Precipitation by Year") +
theme_minimal()## Rows: 47056 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): STATION, NAME
## dbl (10): LATITUDE, LONGITUDE, ELEVATION, PRCP, TAVG, TMAX, TMIN, WDF1, WSF...
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 1 rows containing non-finite values (stat_smooth).
## Warning: Removed 1 rows containing missing values (geom_point).

Example 3: Food consumption and CO2 emissions
Source: https://www.nu3.de/blogs/nutrition/food-carbon-footprint-index-2018
fc.df <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-18/food_consumption.csv')## Rows: 1430 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, food_category
## dbl (2): consumption, co2_emmission
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(fc.df)## # A tibble: 6 × 4
## country food_category consumption co2_emmission
## <chr> <chr> <dbl> <dbl>
## 1 Argentina Pork 10.5 37.2
## 2 Argentina Poultry 38.7 41.5
## 3 Argentina Beef 55.5 1712
## 4 Argentina Lamb & Goat 1.56 54.6
## 5 Argentina Fish 4.36 6.96
## 6 Argentina Eggs 11.4 10.5
Consumption is kg/person/year and CO2 emission is kg CO2/person/year.
- How many different countries are in the data frame?
fc.df |>
dplyr::distinct(country) |>
nrow()## [1] 130
- Arrange the countries from most pork consumption per person to the least pork consumption.
fc.df |>
dplyr::filter(food_category == "Pork") |>
dplyr::select(country, consumption) |>
dplyr::arrange(desc(consumption))## # A tibble: 130 × 2
## country consumption
## <chr> <dbl>
## 1 Hong Kong SAR. China 67.1
## 2 Austria 52.6
## 3 Germany 51.8
## 4 Spain 48.9
## 5 Poland 46.2
## 6 Lithuania 45.7
## 7 Luxembourg 43.6
## 8 Croatia 42.8
## 9 Czech Republic 41.2
## 10 Belarus 40.4
## # … with 120 more rows
- Arrange the countries from the largest carbon footprint with respect to eating habits to the smallest carbon footprint.
fc.df |>
dplyr::rename(co2_emission = co2_emmission) |>
dplyr::group_by(country) |>
dplyr::summarize(totalEmission = sum(co2_emission)) |>
dplyr::arrange(desc(totalEmission))## # A tibble: 130 × 2
## country totalEmission
## <chr> <dbl>
## 1 Argentina 2172.
## 2 Australia 1939.
## 3 Albania 1778.
## 4 New Zealand 1751.
## 5 Iceland 1731.
## 6 USA 1719.
## 7 Uruguay 1635.
## 8 Brazil 1617.
## 9 Luxembourg 1598.
## 10 Kazakhstan 1575.
## # … with 120 more rows
Summary
Data munging is a big part of data science. Data science is an iterative cycle:
- Generate questions about our data.
- Search for answers by transforming, visualizing, and modeling the data.
- Use what you learn to refine our questions and/or ask new ones.
You use questions as tools to guide our investigation. When you ask a question, the question focuses our attention on a specific part of our data set and helps us decide what to do.
For additional practice please check out http://r4ds.had.co.nz/index.html.
Cheat sheets http://rstudio.com/cheatsheets
Thursday, September 14, 2022
Today
- Making graphs
Data visualization is a cornerstone of data science. It gives insights into your data that are not accessible by looking at a spreadsheet or data frame of values.
The {ggplot2} package provides functions to make plots efficiently. The functions are an application of the grammar of graphics theory (Leland Wilkinson) of data visualization.
At a basic level, graphics/plots/charts (all interchangeable terms) provide a way to explore the patterns in data; the presence of extreme values, distributions of individual variables, and relationships between groups of variables.
Graphics should emphasize the findings and insights you want your audience to understand. This requires a balance.
On the one hand, you want to highlight as many interesting findings as possible. On the other hand, you don’t want to include so much information that it overwhelms the audience.
The grammar of graphics specifies how a plot translates data to attributes and geometric objects. - Attributes are things like location on along an axes, color, shape, and size. - Geometric objects are things like points, lines, bars, and polygons.
The type of plot depends on the geometric object, which is specified as a function.
Function names for geometric objects begin with geom_. For example, to create a scatter plot of points the geom_point() function is used.
Make the functions from the {ggplot2} package available in your current session.
library(ggplot2)Bar chart
A simple graph is the bar chart showing the number of cases within each group. Consider again the annual hurricane counts.
Import the data from the file on my website and print the first six rows.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/US.txt"
LH.df <- readr::read_table(loc)##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## All = col_double(),
## MUS = col_double(),
## G = col_double(),
## FL = col_double(),
## E = col_double()
## )
dplyr::glimpse(LH.df)## Rows: 166
## Columns: 6
## $ Year <dbl> 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861,…
## $ All <dbl> 1, 3, 0, 2, 1, 2, 1, 1, 1, 3, 2, 0, 0, 0, 2, 1, 1, 0, 4, 2, 3, 0,…
## $ MUS <dbl> 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ G <dbl> 0, 1, 0, 1, 1, 1, 0, 0, 1, 3, 0, 0, 0, 0, 1, 1, 1, 0, 2, 1, 0, 0,…
## $ FL <dbl> 1, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 2, 0,…
## $ E <dbl> 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0,…
Recall that each case is a year and that the function table() returns the number of years for each landfall count.
table(LH.df$All)##
## 0 1 2 3 4 5 6 7
## 36 50 40 27 6 1 5 1
The number of cases for each count is tallied and displayed below the count. There were 36 cases of 0 hurricanes.
The function geom_bar() creates a bar chart of this frequency table.
ggplot(data = LH.df) +
geom_bar(mapping = aes(x = All))
You begin a plot with the function ggplot() that creates a coordinate system that you add layers to. The first argument of ggplot() is the data frame to use in the graph. So ggplot(data = LH.df) creates an empty graph.
You complete the graph by adding one or more layers. The function geom_bar() adds a layer of bars to our plot, which creates a bar chart.
Each geom_ function takes a mapping argument. This defines how variables in our data frame are mapped to visual properties. The mapping argument is always paired with aes() function, and the x argument of aes() specifies which variables to map to the x axes, in this case All. ggplot() looks for the mapped variable in the data argument, in this case, LH.df.
The function geom_bar() tables the counts and then maps the number of cases to bars with the bar height proportional to the number of cases. Here the number of cases is the number of years with that many hurricanes.
The functions are applied in order (ggplot() comes before geom_bar()) and are linked with the addition + symbol. In this way you can think of the functions as layers in a GIS.
The bar chart contains the same information as displayed by the function table(). The y-axis label is ‘count’ and x-axis label is the column name.
Repeat this time using Florida hurricane counts. The annual number of Florida hurricanes by year is given in column FL in the data frame LH.df.
LH.df$FL## [1] 1 2 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 0 2 1 0 1 2 1 0 3 0 2 0 0 0 3 1
## [38] 2 0 0 0 0 1 2 0 3 1 1 1 0 1 0 1 0 0 2 0 0 1 1 1 0 0 0 1 2 1 0 1 0 1 0 0 2
## [75] 1 2 0 2 1 0 0 0 2 2 2 1 0 0 1 0 1 2 0 1 2 1 2 2 1 2 0 0 1 0 0 1 0 0 0 1 0
## [112] 0 0 3 1 2 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 2 0 1 0 1 0 0 1 0 0 2 0 0 2
## [149] 1 0 0 0 0 4 3 0 0 0 0 0 0 0 0 0 0 1
The geom_bar() function tables these numbers and plots the frequency as a bar.
ggplot(data = LH.df) +
geom_bar(mapping = aes(x = FL)) +
xlab("Number of Florida Hurricanes (1851-2016)") +
ylab("Number of Years")
Here axes labels are placed on the plot with the functions ylab() and xlab(). With this type of ‘layering’ it’s easy to go from data on the web to a publishable plot.
Pie preference
Thirty graduate students are surveyed about their favor pie. Categories are (1) chocolate cream, (2) coconut custard, (3) georgia peach, and (4) strawberry. To make a bar chart first create the data as a character vector and then change the vector to a data frame.
pie <- c(rep('chocolate cream', times = 4),
rep('coconut custard', times = 12),
rep('georgia peach', times = 5),
rep('strawberry', times = 9))
piePref.df <- as.data.frame(pie)Use the function str() to see the column type in the data frame.
str(piePref.df)## 'data.frame': 30 obs. of 1 variable:
## $ pie: chr "chocolate cream" "chocolate cream" "chocolate cream" "chocolate cream" ...
There is a single column in the data frame with the name pie. It is a factor variable with 4 levels one for each type of pie. A factor is a categorical vector. It looks like a character but it can be ordered. This is important when factors are used in statistical models.
Create a table.
table(piePref.df$pie)##
## chocolate cream coconut custard georgia peach strawberry
## 4 12 5 9
Create a bar chart and specify the axis labels.
ggplot(data = piePref.df) +
geom_bar(mapping = aes(x = pie)) +
xlab("Pie Preference") +
ylab("Number of Students")
This is a good start. Improvements should be made.
First, the bar order is alphabetical from left to right. This is the default ordering for character vectors or for factor variables created from character vectors. It is much easier to make comparisons if frequency determines the order.
To change the order on the bar chart specify the order of the factor levels on the vector beer.
pie <- factor(pie,
levels = c("coconut custard", "strawberry", "georgia peach", "chocolate cream"))
piePref.df <- as.data.frame(pie)Now remake the bar chart.
ggplot(data = piePref.df) +
geom_bar(mapping = aes(pie)) +
xlab("Pie Preference") +
ylab("Number of Students")
Second, the vertical axis tic labels are fractions. Since the bar heights are counts (integers) the tic labels also should be integers.
To override this default you add a new y-axis layer. The layer is the function scale_y_continuous() where you indicate the lower and upper limits of the axis with the concatenate (limits = c()) function. Now remake the bar chart.
ggplot(data = piePref.df) +
geom_bar(mapping = aes(pie)) +
xlab("Beer Preference") +
ylab("Number of Students") +
scale_y_continuous(limits = c(0, 15))
Now the chart is publishable. Options exist for changing the look of the plot for digital media include, colors, orientation, background, etc.
For example to change the bar color use the fill = argument in the function geom_bar(). To change the orientation of the bars use the layer function coord_flip, and to change the background use the layer function theme_minimal(). You make changes to the look of the plot with additional layers.
ggplot(data = piePref.df) +
geom_bar(mapping = aes(x = pie), fill = "blue") +
xlab("Pie Preference") +
ylab("Number of Students") +
scale_y_continuous(limits = c(0, 15)) +
coord_flip() +
theme_minimal()
Recall: the fill = is used on the variable named in the aes() function but it is specified outside the aes() function.
Available colors include
colors()In the above example you manually reordered the levels in the factor vector pie according to preference. Let’s see how to do this automatically.
Consider storm intensity of tropical cyclones during 2017. First create two vectors one numeric containing the minimum pressures (millibars) and the other character containing the storm names.
minP <- c(990, 1007, 992, 1007, 1005, 981, 967, 938, 914, 938, 972, 971)
name <- c("Arlene", "Bret", "Cindy", "Don", "Emily", "Franklin", "Gert",
"Harvey", "Irma", "Jose", "Katia", "Lee")The function reorder() takes a character vector as the first argument and returns an ordered factor with the order dictated by the numeric values in the second argument.
reorder(name, minP)## [1] Arlene Bret Cindy Don Emily Franklin Gert Harvey
## [9] Irma Jose Katia Lee
## attr(,"scores")
## Arlene Bret Cindy Don Emily Franklin Gert Harvey
## 990 1007 992 1007 1005 981 967 938
## Irma Jose Katia Lee
## 914 938 972 971
## 12 Levels: Irma Harvey Jose Gert Lee Katia Franklin Arlene Cindy Emily ... Don
The vector name is in alphabetically order but the factor levels indicate the order of storms from lowest pressure (Irma) to highest pressure (Don).
Using the mutate() function you can add a column to a data frame where the column is an ordered factor.
Note that it is the difference in pressure (deltaP for short) between the air outside the tropical cyclone and the air in the center that causes the winds. Cyclones with a large pressure difference are stronger in terms of their wind speed.
Typically the air outside is about 1014 mb so you compute deltaP and then reorder the tropical cyclone names using this computed variable.
df <- data.frame(name, minP) |>
dplyr::mutate(deltaP = 1014 - minP,
nameOrderedFactor = reorder(name, deltaP))Finally you plot the bar chart. Since there is no tabulation of the values you use geom_col() instead of geom_bar().
ggplot(data = df) +
geom_col(mapping = aes(x = nameOrderedFactor, y = deltaP)) +
ylab("Pressure Difference [mb]") +
xlab("Atlantic Tropical Cyclones of 2017") +
coord_flip()
Note: geom_bar() plots a bar chart AFTER tabulating a column. geom_col() plots a bar chart on a pre-tabulated column.
Let’s return to the weather data from Tallahassee.
df <- readr::read_csv(file = "data/TLH_SOD1892.csv") |>
dplyr::filter(STATION == "USW00093805") |>
dplyr::mutate(Year = lubridate::year(DATE),
Month = lubridate::month(DATE)) |>
dplyr::filter(Year >= 1980 & Month == 9) |>
dplyr::group_by(Year) |>
dplyr::summarize(TotalPrecip = sum(PRCP)) |>
dplyr::mutate(Year = reorder(as.factor(Year), TotalPrecip))## Rows: 47056 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): STATION, NAME
## dbl (10): LATITUDE, LONGITUDE, ELEVATION, PRCP, TAVG, TMAX, TMIN, WDF1, WSF...
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(data = df) +
geom_col(mapping = aes(x = Year, y = TotalPrecip)) +
ylab("September Rainfall [in]") +
coord_flip()
Histogram
The histogram is similar to the bar chart except it uses bars to indicate frequency (or proportion) over an interval of continuous values. For instance, with continuous values the function table() is not useful.
x <- rnorm(n = 10)
table(x)## x
## -1.6032276455739 -1.21105005355005 -0.930597500697453 -0.659731635004227
## 1 1 1 1
## -0.638249710440162 -0.499239870677882 -0.0277773938058075 0.277103773302114
## 1 1 1 1
## 0.526728668221472 0.545780025214686
## 1 1
So neither is a bar plot.
A histogram is made as follows: First a collection of disjoint intervals, called bins, covering the range of data points is chosen. “Disjoint” means no overlap, so the intervals look like (a,b] or [a,b). The interval (a,b] means the interval contains all the values from a to b including b but not a, whereas the interval [a,b) means the interval contains all the values from a to b including a but not b.
Second, the number of data values in each of these intervals is counted. Third, a bar is drawn above the interval so that the area of the bar is proportional to the frequency. If the intervals defining the bins have the same width, then the height of the bar is proportional to the frequency (the number of values inside the interval).
Let’s return to the Florida precipitation data.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/FLprecip.txt"
FLp.df <- readr::read_table(loc)##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Jan = col_double(),
## Feb = col_double(),
## Mar = col_double(),
## Apr = col_double(),
## May = col_double(),
## Jun = col_double(),
## Jul = col_double(),
## Aug = col_double(),
## Sep = col_double(),
## Oct = col_double(),
## Nov = col_double(),
## Dec = col_double()
## )
Recall that the columns in the data frame FLp.df are months (variables) and rows are years. Year is an integer (int) vector and the months are numeric (num) vectors. Create a histogram of May precipitation.
ggplot(data = FLp.df) +
geom_histogram(mapping = aes(x = May), col = "white") +
xlab("May Precipitation in Florida (in)") ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

By default the function geom_histogram() picks 30 bins. Since there are only 118 May values many of the bins have fewer than 5 values.
When making a histogram you need to vary the number of bins before deciding on a final plot. This can be done with the bins = or binwidth = argument. For example, the look of the histogram is improved by halving the default number of bins.
ggplot(data = FLp.df) +
geom_histogram(mapping = aes(x = May), col = "white", bins = 15) +
xlab("May Precipitation in Florida (in)") 
It looks even better by decreasing the number of bins to 11.
ggplot(data = FLp.df) +
geom_histogram(mapping = aes(x = May), col = "white", bins = 11, fill = "green3") +
xlab("May Precipitation in Florida (in)") +
ylab("Number of Years")
Here the fill = argument is used to change color and a ylab() layer is added to make the y-axis label more concise.
The geom_rug() layer adds the location of the data values as tic marks just above the horizontal axis. And the color = "white" is the color of the bin boundaries.
ggplot(data = FLp.df) +
geom_histogram(mapping = aes(x = May), col = "white", bins = 11, fill = "green3") +
xlab("May Precipitation in Florida (in)") +
ylab("Number of Years") +
geom_rug(mapping = aes(x = May))
ggplot(data = FLp.df, mapping = aes(x = May)) +
geom_histogram(col = "black", bins = 11, fill = "pink") +
xlab("May Precipitation in Florida (in)") +
ylab("Number of Years") +
geom_rug()
Density plot
A density plot is a smoothed histogram with units of probability on the vertical axis. It’s motivated by the fact that for a continuous variable, the probability that the variable takes on any particular value is 0. Instead you need a range of values over which a probability is defined.
The probability density answers the question, what is the chance that a value falls in a small interval. This chance varies depending on where the value is located within the distribution of all values (e.g., near the middle of the distribution the chance is highest).
ggplot(data = FLp.df) +
geom_density(mapping = aes(x = May)) +
xlab("May Precipitation in Florida (in)") 
The vertical axis is the average chance that rainfall will take on a value along the horizontal axis within a given small interval. The size of the interval is determined by the bandwidth (bw =).
The values along the vertical axis depends on the data units. It can be tricky to interpret. Instead geom_freqpoly() produces a density-like graph where the units on the y-axis are counts as with the histogram.
ggplot(data = FLp.df, aes(x = May)) +
geom_freqpoly(color = "green3", binwidth = 1) +
xlab("May Precipitation in Florida (in)") +
geom_rug()
Box plot
The box plot graphs the summary statistics. These statistics include the minimum value, the maximum value, the 1st & 3rd quartile values, and the median value. The easiest way to create a box plot is to use the function boxplot().
boxplot(FLp.df$May)
The function boxplot() is from the base {graphics} package. It is not a {ggplot2} function. Others from this package include hist() for histograms and plot() for scatter plots.
The base graphics lets you manipulate details of a graph. For example:
boxplot(FLp.df$May,
ylab = "May Precipitation in FL (in)")
f <- fivenum(FLp.df$May)
text(rep(1.3, 5), f, labels = c("Minimum", "1st Quartile",
"Median", "3rd Quartile",
"Maximum"))
text(1.3, 7.792, labels = "Last Value Within\n 1.5xIQR Above 3rd Q")
The box plot illustrates the five numbers graphically. The median is the line through the box. The bottom and top of the box are the 1st and 3rd quartile values. Whiskers extend vertically from the box downward toward the minimum and upward toward the maximum.
If values extend beyond 1.5 times the interquartile range (either above or below the corresponding quartile) the whisker is truncated at the last value within the range and points are used to indicate outliers.
To make a box plot using the function ggplot() you need a dummy variable for the x argument in the function aes(). This is done with x = "".
ggplot(FLp.df) +
geom_boxplot(mapping = aes(x = "", y = May)) +
xlab("") +
ylab("May Precipitation in Florida (in)")
Side-by-side box plots
Suppose you want to show box plots for each month. In this case you make the x argument in the aes() the name of a column containing the vector of month names.
You first turn the data frame from its native ‘wide’ format to a {ggplot2} friendly ‘long’ format.
Wide format data is called ‘wide’ because it typically has a lot of columns that stretch across our computer screen. Long format data is called ‘long’ because it has fewer columns while preserving all the information. In order to do have fewer columns, it has to be longer.
Wide format data are most common. They are convenient for data entry. They let us see more of the data at one time. For example, the FLp.df data frame.
head(FLp.df)## # A tibble: 6 × 13
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1895 3.28 3.24 2.50 4.53 4.25 4.5 7.45 6.10 4.67 3.09 2.65 1.59
## 2 1896 3.93 3.02 2.57 0.498 2.7 11.2 8.22 5.89 4.35 2.96 3.52 2.07
## 3 1897 1.84 6 2.12 4.39 2.28 5.22 7.21 6.83 11.1 4.10 1.75 2.68
## 4 1898 0.704 2.01 1.26 1.32 1.51 3.29 8.95 13.1 5.23 5.88 2.19 3.89
## 5 1899 4.52 5.92 1.90 3.40 1.11 5.80 9.26 6.71 5.13 5.88 0.751 1.94
## 6 1900 3.21 4.37 6.8 4.32 3.89 9.99 7.50 4.49 4.93 5.23 1.22 4.29
The long data format is less familiar. It corresponds to the relational model for storing data used by most modern databases like SQL.
Use the pivot_longer() function from the {tidyr} package to turn the wide data frame into a long data frame. Let’s do it and then decipher what happens.
library(tidyr)
FLpL.df <- FLp.df |>
tidyr::pivot_longer(cols = -Year,
names_to = "Month",
values_to = "Precipitation")
str(FLpL.df)## tibble [1,404 × 3] (S3: tbl_df/tbl/data.frame)
## $ Year : num [1:1404] 1895 1895 1895 1895 1895 ...
## $ Month : chr [1:1404] "Jan" "Feb" "Mar" "Apr" ...
## $ Precipitation: num [1:1404] 3.28 3.24 2.5 4.53 4.25 ...
Note that the column Month is a character vector. When making a plot using this variable the order will be alphabetical. So instead you change it to a factor vector with levels equal to the month abbreviations.
month.abb## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
FLpL.df <- FLpL.df |>
dplyr::mutate(Month = factor(Month, levels = month.abb))The pivot_longer() function takes all the columns to pivot into a longer format. Here chose them all EXCEPT the one named after the - sign (Year). All variables are measured (precipitation in units of inches) except Year.
The resulting long data frame has the Year variable in the first column and the remaining column names as the name variable in the second column.
You change the default name to Month by specifying the names_to = "Month" argument. The third column contains the corresponding precipitation values all in a single column names value.
You change the default value by specifying the values_to = "Precipitation".
Note that you reverse this with the pivot_wider() function.
FLpW.df <- FLpL.df |>
tidyr::pivot_wider(id_cols = Year,
names_from = Month,
values_from = Precipitation)To help conceptualize what is going on take a look at this gif.
Then to create the box plot specify that the x-axis be the key variable (here Month) and the y-axis to be the measured variable (here Precipitation).
ggplot(data = FLpL.df) +
geom_boxplot(mapping = aes(x = Month, y = Precipitation)) +
ylab("Precipitation (in)")
This is a climograph.
Each geom_ function is a layer. Data for the layer is specified in the function ggplot() with the data frame argument and the aes() function. To add another layer to the plot with different data you specify the data within the geom function. For example, lets repeat the climograph of monthly precipitation highlighting the month of May.
You add a geom_boxplot() layer and specify a subset of the data using the subset [] operator when specifying the data = argument.
ggplot(data = FLpL.df,
aes(x = Month, y = Precipitation)) +
geom_boxplot() +
ylab("Precipitation (in)") +
geom_boxplot(data = FLpL.df[FLpL.df$Month == "May", ],
aes(x = Month, y = Precipitation),
fill = "green")
Cheat sheets: https://ggplot2tor.com/cheatsheets/
Additional help: See: https://moderndive.com/2-viz.html
Tuesday, September 19, 2022
Today
- More about making graphs in R
Comparing distributions
Previously you learned how to make a histogram from data. To review, consider again the Florida rainfall data.
Import the data.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/FLprecip.txt"
FLp.df <- readr::read_table(loc, na = "-9.900")##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## Year = col_double(),
## Jan = col_double(),
## Feb = col_double(),
## Mar = col_double(),
## Apr = col_double(),
## May = col_double(),
## Jun = col_double(),
## Jul = col_double(),
## Aug = col_double(),
## Sep = col_double(),
## Oct = col_double(),
## Nov = col_double(),
## Dec = col_double()
## )
Then use ggplot() and geom_histogram() functions to make a histogram of rainfall during March and add a label on the horizontal axis (x-axis). Here you assign the plot to an object called p1. An list object is created in your environment but nothing is plotted until you type the object name.
library(ggplot2)
p1 <- ggplot(data = FLp.df) +
geom_histogram(mapping = aes(x = Mar),
bins = 11,
fill = "green3",
col = "white") +
xlab("March Rainfall in Florida (in)")
p1
The histogram shows the shape of the distribution. The distribution is made up of all 118 years of March rainfall. Most years have rainfall values between 2 and 4 inches. A few years have values that exceed 7.5 inches.
The average, median, and standard deviations are obtained as follows:
FLp.df |>
dplyr::select(Mar) |>
dplyr::summarize(avg = mean(Mar),
med = median(Mar),
sd = sd(Mar),
min = min(Mar),
max = max(Mar))## # A tibble: 1 × 5
## avg med sd min max
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3.66 3.35 1.95 0.496 8.70
The average value is larger than the median value and the histogram is not symmetric. That is, the number of cases with with low rainfall exceeds the number of cases with heavy rainfall.
The histogram helps us to describe the statistical distribution of the values.
To see this, recall that you can generate values from any distribution. For example you generate values from a normal (Guassian distribution) with the rnorm() function by specifying the mean and the standard deviation.
Here you do this using the mean and standard deviation from our rainfall values. Since there are 118 March rainfall values (one for each year) you set n = 118.
nd <- rnorm(n = 118,
mean = 3.65,
sd = 1.95)
nd## [1] 6.7416884 0.2139696 2.7433698 3.8824960 0.6194203 4.4354257
## [7] 5.4050169 0.8092758 -0.3419746 7.5958964 -2.5483778 5.5816917
## [13] 3.1746390 5.8613094 5.6541227 0.9086666 6.3098643 2.7549266
## [19] 3.8427015 6.3239940 3.4684659 5.3702174 1.4655571 2.6863474
## [25] 4.3661373 7.5019611 4.1326324 3.5311833 3.0550273 3.0955286
## [31] 3.5985192 -1.0867786 -0.4001121 1.8175477 4.3137304 6.5331517
## [37] 3.2289950 1.1430855 3.4980687 2.4295630 2.7532689 4.8229282
## [43] 2.7737379 4.3975801 -1.3843887 5.9203663 1.2241661 3.9597285
## [49] 3.9163964 5.6894578 3.8132714 3.1678973 4.3318985 7.7737934
## [55] 5.9390685 3.5930825 7.1304144 1.2929749 2.6178412 3.6042137
## [61] 5.7586311 4.9468082 1.7642494 2.2107400 1.0706937 2.2323835
## [67] 0.4622967 6.4902349 3.4607394 2.8777164 2.2998399 4.1054467
## [73] 0.1484932 2.1125788 3.7443694 3.3144648 5.9140059 0.5406493
## [79] 6.3481199 3.2473482 4.8429054 4.3916607 6.1921813 3.9140055
## [85] 2.9784233 2.1965664 4.1856974 3.5287084 2.4022348 2.4280930
## [91] 5.0584030 5.7855713 3.7728832 1.2280599 4.3302056 4.2992629
## [97] 6.4965545 4.7139889 5.5938008 2.7929664 1.1824149 2.6774331
## [103] 6.9224717 4.0089298 4.6029910 2.7755865 2.7627575 5.0998429
## [109] 2.2951031 1.6149303 5.5045684 6.8846473 3.2636776 6.2280476
## [115] 1.7441248 4.7856600 4.2775693 5.0885180
Collectively these values look quite a bit like the actual rainfall. Let’s make a histogram from these 118 values and assign it to p2.
df <- data.frame(nd)
p2 <- ggplot(data = df) +
geom_histogram(mapping = aes(x = nd),
bins = 11,
col = "white") +
xlab("Gaussian Distribution")
p2
Let’s do the same for a set of values from a uniform distribution and from a gamma distribution.
ud <- runif(n = 118,
min = .5,
max = 8.7)
p3 <- ggplot(data = df) +
geom_histogram(mapping = aes(x = ud),
bins = 11,
col = "white") +
xlab("Uniform Distribution")
gd <- rgamma(n = 118,
shape = 3.2,
rate = .9)
p4 <- ggplot(data = df) +
geom_histogram(mapping = aes(x = gd),
bins = 11,
col = "white") +
xlab("Gamma Distribution")Now put all four plots on a single graph. You do this with the {patchwork} package.
The package gives operators like + and / different meanings when applied to ggplot objects.
library(patchwork)##
## Attaching package: 'patchwork'
## The following object is masked from 'package:MASS':
##
## area
(p1 + p2) / (p3 + p4)
What distribution best matches the shape of the March rainfall values?
Box plots
A box plot graphically illustrates summary statistics. The summary statistics include the minimum value, the maximum value, the 1st & 3rd quartile values, and the median value.
A non-ggplot way to create a box plot is to use the function boxplot(). Here you get a box plot of the May rainfall.
boxplot(FLp.df$May)
The function boxplot() is from the base {graphics} package. Others from this package include hist() for histograms and plot() for scatter plots.
The base graphics lets you manipulate details of a graph. For example:
boxplot(FLp.df$May,
ylab = "May Rainfall in FL (in)")
f <- fivenum(FLp.df$May)
text(rep(1.3, 5), f, labels = c("Minimum", "1st Quartile",
"Median", "3rd Quartile",
"Maximum"))
text(1.3, 7.792, labels = "Last Value Within\n 1.5xIQR Above 3rd Q")
The box plot illustrates the five numbers graphically. The median is the line through the box. The bottom and top of the box are the 1st and 3rd quartile values. Whiskers extend vertically from the box downward toward the minimum and upward toward the maximum.
If values extend beyond 1.5 times the interquartile range (either above or below the corresponding quartile) the whisker is truncated at the last value within the range and points are used to indicate outliers.
To make the same box plot using functions from the {ggplot2} package you use the geom_boxplot() layer.
ggplot(data = FLp.df) +
geom_boxplot(mapping = aes(y = May)) +
xlab("") +
ylab("May Rainfall in Florida (in)")
Long data frames
Suppose you want to make a separate box plot for each month. In this case you make the x aesthetic the name of a column containing the vector of month names. The problem is that the month names are column labels rather than a single character vector.
You need to turn the data frame from its native ‘wide’ format to a ‘long’ format. The FLp.df is ‘wide’ because there are separate columns for each month. Wide data are more common because they are convenient for entering data and they let you see more of the data at one time.
head(FLp.df)## # A tibble: 6 × 13
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1895 3.28 3.24 2.50 4.53 4.25 4.5 7.45 6.10 4.67 3.09 2.65 1.59
## 2 1896 3.93 3.02 2.57 0.498 2.7 11.2 8.22 5.89 4.35 2.96 3.52 2.07
## 3 1897 1.84 6 2.12 4.39 2.28 5.22 7.21 6.83 11.1 4.10 1.75 2.68
## 4 1898 0.704 2.01 1.26 1.32 1.51 3.29 8.95 13.1 5.23 5.88 2.19 3.89
## 5 1899 4.52 5.92 1.90 3.40 1.11 5.80 9.26 6.71 5.13 5.88 0.751 1.94
## 6 1900 3.21 4.37 6.8 4.32 3.89 9.99 7.50 4.49 4.93 5.23 1.22 4.29
You can reduce the number of columns by stacking the rainfall values into a single column and then labeling the rows by month. This preserves all the information from the wide format but does so with fewer columns.
The long data format is less familiar. It corresponds to the relational model for storing data used by databases like SQL.
Consider the following wide data frame with column names w, x, y, and z. id w x y z 1 A C E G 2 B D F H
The long data frame version would be id name value 1 w A 1 x C 1 y E 1 z G 2 w B 2 x D 2 y F 2 z H
You use the pivot_longer() function from the {tidyr} package to turn the wide data frame into a long data frame. Let’s do it and then decipher what happens.
FLpL.df <- FLp.df |>
tidyr::pivot_longer(cols = -Year,
names_to = "Month",
values_to = "Rainfall")
str(FLpL.df)## tibble [1,404 × 3] (S3: tbl_df/tbl/data.frame)
## $ Year : num [1:1404] 1895 1895 1895 1895 1895 ...
## $ Month : chr [1:1404] "Jan" "Feb" "Mar" "Apr" ...
## $ Rainfall: num [1:1404] 3.28 3.24 2.5 4.53 4.25 ...
The pivot_longer() function takes all the columns to pivot into a longer format. Here you chose them all EXCEPT the one named after the - sign (Year). All variables are measured (rainfall in units of inches) except Year.
The resulting long data frame has the Year variable in the first column and the remaining column names as the name variable in the second column. You change the default name to Month by specifying the names_to = "Month" argument. The third column contains the corresponding rainfall values all in a single column names value. You change the default value by specifying the values_to = "Rainfall".
Note that the column Month is a character vector. When you make a plot using this variable the order will be alphabetical. So you change the variable from a character vector to a factor vector with levels equal to the month abbreviations.
month.abb## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
FLpL.df <- FLpL.df |>
dplyr::mutate(Month = factor(Month, levels = month.abb))Note that you can reverse this with the pivot_wider() function.
FLpW.df <- FLpL.df |>
tidyr::pivot_wider(id_cols = Year,
names_from = Month,
values_from = Rainfall)Then to create the box plot specify that the x aesthetic (x-axis) to be Month and the y-axis to be Rainfall.
ggplot(data = FLpL.df) +
geom_boxplot(mapping = aes(x = Month, y = Rainfall)) +
ylab("Rainfall (in)")
The graph shows the variation of rainfall by month.
Each geom_ function is a layer. Data for the layer is specified in the function ggplot() with the data frame argument and the aes() function. To add another layer to the plot with different data you specify the data within the geom_ function.
For example, lets repeat the graph of monthly rainfall highlighting the month of May. First you filter the data frame keeping only rows labeled May and assign this to a new data frame object called May.df.
You then repeat the plot but add another geom_boxplot() layer that includes the argument data = May.df along with the corresponding aes() function. Finally you color the box green.
May.df <- FLpL.df |>
dplyr::filter(Month == "May")
ggplot(data = FLpL.df, aes(x = Month, y = Rainfall)) +
geom_boxplot() +
ylab("Rainfall (in)") +
geom_boxplot(data = May.df,
mapping = aes(x = Month, y = Rainfall),
fill = "green") +
theme_minimal()
Scatter plots
An import graph is the scatter plot which shows the relationship between two numeric variables. It plots the values of one variable against the values of the other as points \((x_i, y_i)\) in a Cartesian plane.
For example, to show the relationship between April and September values of rainfall you type
ggplot(FLp.df) +
geom_point(mapping = aes(x = Apr, y = Sep)) +
xlab("April Rainfall (in)") +
ylab("September Rainfall (in)")
The plot shows that dry Aprils tend to be followed by dry Septembers and wet Aprils tend to be followed by wet Septembers.
There is a direct (or positive) relationship between the two variables although the points are scattered widely indicating the relationship is loose.
If your goal is to model the relationship, you plot the dependent variable (the variable you are interested in modeling) on the vertical axis.
Here you put the September values on the vertical axis since a predictive model would use April values to predict September values because April comes before September in the calendar year.
If the points have a natural ordering then you use the geom_line() function. For example, to plot the September Rainfall values as a time series type
ggplot(FLp.df) +
geom_line(mapping = aes(x = Year, y = Sep)) +
xlab("Year") +
ylab("September Rainfall (in)")
Rainfall values fluctuate from one September to the next, but there does not appear to be a long-term trend. With time series data it is better to connect the values with lines rather than use points unless values are missing.
Create a plot of the May values of the North Atlantic oscillation (NAO) with Year on the horizontal axis. Add appropriate axis labels.
loc <- "http://myweb.fsu.edu/jelsner/temp/data/NAO.txt"
NAO.df <- readr::read_table(file = loc)
ggplot(NAO.df, aes(x = Year, y = May)) +
geom_line() +
xlab("Year") +
ylab("North Atlantic Oscillation (s.d.)")Let’s return to the mpg data frame. The data frame contains different automobiles by who made it, the model, engine size, mileage, class, etc.
names(mpg)## [1] "manufacturer" "model" "displ" "year" "cyl"
## [6] "trans" "drv" "cty" "hwy" "fl"
## [11] "class"
Let’s start with a scatter plot showing highway mileage on the vertical axis and engine size on the horizontal axis.
ggplot(mpg) +
geom_point(mapping = aes(x = displ, y = hwy),
color = "blue")
You add a third variable, like class, to a two dimensional scatterplot by mapping it to an aesthetic. An aesthetic is a visual property of the objects in our plot. Aesthetics include things like the size, the shape, or the color of our points. You can display a point in different ways by changing the levels of its aesthetic properties (e.g., changing the level by size, color, type).
For example, you map the colors of our points to the class variable to reveal the class of each car.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ,
y = hwy,
color = class))
To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside aes(). Note in the previous plot color = was specified outside aes().
ggplot() will automatically assign a unique level of the aesthetic (here a unique color) to each unique value of the variable, a process known as scaling. ggplot() will also add a legend that explains which levels correspond to which values.
The colors show that many of the unusual points are two-seater cars. Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage.
Facets
One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split our plot into facets, subplots that each display one subset of the data.
To facet a plot by a single variable, use facet_wrap(). The first argument of facet_wrap() should be a formula, which you create with ~ (tilde) followed by a variable name (here ‘formula’ is the name of a data structure in R, not a synonym for ‘equation’). The variable that you pass to facet_wrap() should be discrete.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
To facet a plot on the combination of two variables, add facet_grid() to the plot call. The first argument of facet_grid() is also a formula. This time the formula should contain two variable names separated by a ~ with the first variable named varying in the vertical direction and the second varying in the horizontal direction.
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
Here drv refers to the drive train: front-wheel (f), rear-wheel (r) or 4-wheel (4).
Example: Palmer penguins
Let’s return to the penguins data set. You import it as a data frame using readr::read_csv() function.
loc <- "https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv"
penguins <- readr::read_csv(loc)## Rows: 344 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(penguins)## # A tibble: 6 × 8
## species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 fema…
## 3 Adelie Torge… 40.3 18 195 3250 fema…
## 4 Adelie Torge… NA NA NA NA <NA>
## 5 Adelie Torge… 36.7 19.3 193 3450 fema…
## 6 Adelie Torge… 39.3 20.6 190 3650 male
## # … with 1 more variable: year <dbl>
Here you will visualize the relationship between flipper_length_mm and body_mass_g with respect to each species.
https://towardsdatascience.com/penguins-dataset-overview-iris-alternative-9453bb8c8d95
Start by creating a scatter plot with flipper length on the horizontal axis and body mass on the vertical axis.
ggplot(data = penguins) +
geom_point(aes(x = flipper_length_mm, y = body_mass_g))## Warning: Removed 2 rows containing missing values (geom_point).

Next, make the color and shape of the points correspond to the species type. Use the colors “darkorange,” “purple,” “cyan4.”
ggplot(data = penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = species)) +
scale_color_manual(values = c("darkorange", "purple", "cyan4"))## Warning: Removed 2 rows containing missing values (geom_point).

Finally, separate the scatter plots by island.
ggplot(data = penguins) +
geom_point(aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
shape = species)) +
scale_color_manual(values = c("darkorange", "purple", "cyan4")) +
facet_wrap(~ island)## Warning: Removed 2 rows containing missing values (geom_point).

An expository graph
Adding labels and titles turns an exploratory graph into an expository graph. Consider again the mpg dataset and plot highway mileage (hwy) as a function of engine size (displ) with the color of the point layer given by automobile class (class).
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency generally decreases with engine size")## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The graph title should summarize the main finding. Avoid titles that just describe what the plot is, e.g. “A scatter plot of engine displacement vs. fuel economy.” If you need to add more text use subtitles and captions.
subtitle =adds additional detail in a smaller font beneath the title.caption =adds text at the bottom right of the plot, often used to describe the source of the data.
ggplot(data = mpg,
mapping = aes(displ, hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data are from fueleconomy.gov")## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Exporting your graph
When you knit to HTML and a plot is produced it gets output as a png file in our project directory.
You can use the Export button under the Plots tab.
Or you can export the file directly using R code. Here the file gets put into our working directory.
png(file = "Test.png")
p1
dev.off()Note that the function png() opens the device and the function dev.off() closes it.
You list the files in your working directory with the command dir().
CHECK OUT {ggdist}
Thursday, September 21, 2022
Today
- Making maps
Simple feature data frames
Geographic visualization of data is important to geographers and environmental scientists. There are many tools for geo visualization from full-scale GIS applications such as ArcGIS and QGIS to web-based tools like Google maps.
Using code to make maps (instead of point and click) has the benefit of transparency and reproducibility.
Simple features (simple feature access) refers to a standard that describes how objects in the real world are represented in computers. Emphasis is on the spatial geometry of the objects.
The standard also describes how such objects are stored in and retrieved from databases, and which geometrical operations are defined for them.
The simple feature standard is implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS). R has an implementation in the {sf} package.
One type of spatial data file is called a shapefile. As an example, the U.S. census information at the state and territory level in a file called cb_2018_us_state_5m.shp. https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
A shapefile encodes points, lines, and polygons in geographic space, and is actually a set of files. Shapefiles appear with a .shp extension and with accompanying files ending in .dbf and .prj.
.shpstores the geographic coordinates of the geographic features (e.g. country, state, county).dbfstores data associated with the geographic features (e.g. unemployment rates).prjstores information about the projection of the coordinates in the shapefile
To get a shapefile into R all the files need to be in the same folder (directory).
As an example, you import the census data with the sf::st_read() function from the {sf} package. You assign to the object USA.sf the contents of the spatial data frame.
USA.sf <- sf::st_read(dsn = "data/cb_2018_us_state_5m")## Reading layer `cb_2018_us_state_5m' from data source
## `/Users/jameselsner/Desktop/ClassNotes/QG-2022/data/cb_2018_us_state_5m'
## using driver `ESRI Shapefile'
## Simple feature collection with 56 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
## Geodetic CRS: NAD83
The output includes information about the file. The object shows up in our environment as a data frame with 56 observations and 10 variables.
Each observation is either a state or territory.
The class() function tells us the type of data frame and the names() function list the variable names.
class(USA.sf)## [1] "sf" "data.frame"
names(USA.sf)## [1] "STATEFP" "STATENS" "AFFGEOID" "GEOID" "STUSPS" "NAME"
## [7] "LSAD" "ALAND" "AWATER" "geometry"
The file is a simple feature (sf) data frame (data.frame). This means it behaves like a data frame but it also contains information about where the observations are located.
The first several columns serve as identifiers. The variable ALAND is the land area (square meters) and the AWATER is the water area (sq. m).
The last column labeled geometry contains information about location stored as a ‘feature.’ The function sf::st_geometry() list the first 5 geometries.
sf::st_geometry(USA.sf)## Geometry set for 56 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1473 ymin: -14.55255 xmax: 179.7785 ymax: 71.35256
## Geodetic CRS: NAD83
## First 5 geometries:
## MULTIPOLYGON (((-104.0535 41.15726, -104.0527 4...
## MULTIPOLYGON (((-122.3283 48.02134, -122.3217 4...
## MULTIPOLYGON (((-109.0502 31.48, -109.0498 31.4...
## MULTIPOLYGON (((-104.0577 44.99743, -104.0502 4...
## MULTIPOLYGON (((-106.6455 31.89867, -106.6408 3...
The geometry type in this case is MULTIPOLYGON.
A feature is an object in the real world. Often features will consist of a set of features. For instance, a tree is a feature but a set of trees in a forest is itself a feature. The trees are represented as points while the forest boundary as a polygon.
Features have a geometry describing where on Earth the feature is located. They also have attributes, which describe other properties of the feature.
More on spatial data in a few weeks.
Making a boundary map
The functions in the {ggplot2} package work with simple feature data frames to generate maps using the same grammar.
The important function is geom_sf(). This function draws the geometries.
For example, to draw a map showing the state and territorial boundaries first use ggplot() with the data argument specifying the simple feature data frame USA.sf and then add the geom_sf() function as a layer with the + symbol.
library(ggplot2)
ggplot(data = USA.sf) +
geom_sf()
Note: you don’t need the mapping = aes() function. The mapping is assumed based on the fact that there is a geometry column in the simple feature data frame.
The geom_sf() function maps the east-west coordinate to the x aesthetic and the north-south coordinate to the y aesthetic.
The map is not very informative. Let’s zoom into the contiguous states.
What states/territories are there in the data frame USA.sf?
USA.sf$NAME## [1] "Nebraska"
## [2] "Washington"
## [3] "New Mexico"
## [4] "South Dakota"
## [5] "Texas"
## [6] "California"
## [7] "Kentucky"
## [8] "Ohio"
## [9] "Alabama"
## [10] "Georgia"
## [11] "Wisconsin"
## [12] "Oregon"
## [13] "Pennsylvania"
## [14] "Mississippi"
## [15] "Missouri"
## [16] "North Carolina"
## [17] "Oklahoma"
## [18] "West Virginia"
## [19] "New York"
## [20] "Indiana"
## [21] "Kansas"
## [22] "Idaho"
## [23] "Nevada"
## [24] "Vermont"
## [25] "Montana"
## [26] "Minnesota"
## [27] "North Dakota"
## [28] "Hawaii"
## [29] "Arizona"
## [30] "Delaware"
## [31] "Rhode Island"
## [32] "Colorado"
## [33] "Utah"
## [34] "Virginia"
## [35] "Wyoming"
## [36] "Louisiana"
## [37] "Michigan"
## [38] "Massachusetts"
## [39] "Florida"
## [40] "United States Virgin Islands"
## [41] "Connecticut"
## [42] "New Jersey"
## [43] "Maryland"
## [44] "South Carolina"
## [45] "Maine"
## [46] "New Hampshire"
## [47] "District of Columbia"
## [48] "Guam"
## [49] "Commonwealth of the Northern Mariana Islands"
## [50] "American Samoa"
## [51] "Iowa"
## [52] "Puerto Rico"
## [53] "Arkansas"
## [54] "Tennessee"
## [55] "Illinois"
## [56] "Alaska"
To zoom in you keep only rows corresponding to states (in the lower 48) from the simple feature data frame.
Recall to pick out rows in a data frame you use the dplyr::filter() function from the {dplyr} package.
First you need to get a list of all the states you want to keep. The state.name vector object contains all 50 state names. This is like the month.abb vector you saw earlier.
state.name## [1] "Alabama" "Alaska" "Arizona" "Arkansas"
## [5] "California" "Colorado" "Connecticut" "Delaware"
## [9] "Florida" "Georgia" "Hawaii" "Idaho"
## [13] "Illinois" "Indiana" "Iowa" "Kansas"
## [17] "Kentucky" "Louisiana" "Maine" "Maryland"
## [21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
## [25] "Missouri" "Montana" "Nebraska" "Nevada"
## [29] "New Hampshire" "New Jersey" "New Mexico" "New York"
## [33] "North Carolina" "North Dakota" "Ohio" "Oklahoma"
## [37] "Oregon" "Pennsylvania" "Rhode Island" "South Carolina"
## [41] "South Dakota" "Tennessee" "Texas" "Utah"
## [45] "Vermont" "Virginia" "Washington" "West Virginia"
## [49] "Wisconsin" "Wyoming"
Let’s remove the rows corresponding to the names "Alaska" and "Hawaii". These are elements 2 and 11 so you create a new vector object called sn containing only the names of the lower 48.
sn <- state.name[c(-2, -11)]
sn## [1] "Alabama" "Arizona" "Arkansas" "California"
## [5] "Colorado" "Connecticut" "Delaware" "Florida"
## [9] "Georgia" "Idaho" "Illinois" "Indiana"
## [13] "Iowa" "Kansas" "Kentucky" "Louisiana"
## [17] "Maine" "Maryland" "Massachusetts" "Michigan"
## [21] "Minnesota" "Mississippi" "Missouri" "Montana"
## [25] "Nebraska" "Nevada" "New Hampshire" "New Jersey"
## [29] "New Mexico" "New York" "North Carolina" "North Dakota"
## [33] "Ohio" "Oklahoma" "Oregon" "Pennsylvania"
## [37] "Rhode Island" "South Carolina" "South Dakota" "Tennessee"
## [41] "Texas" "Utah" "Vermont" "Virginia"
## [45] "Washington" "West Virginia" "Wisconsin" "Wyoming"
Now you filter the USA.sf data frame keeping only the rows that are listed in the vector of state names. Assign this spatial data frame the name USA_48.sf.
USA_48.sf <- USA.sf |>
dplyr::filter(NAME %in% sn)The function %in% finds only the rows in USA.sf with NAME equal to the names in the vector sn and the dplyr::filter() function chooses these rows.
Now redraw the map using the USA_48.sf simple feature data frame.
ggplot(data = USA_48.sf) +
geom_sf()
Since the map is a ggplot() object, it is modified like any other ggplot() graph. For example, you change the color of the map and the borders as follows.
ggplot(data = USA_48.sf) +
geom_sf(fill = "skyblue",
color = "gray70")
You can filter by state. Here you create a new simple feature data frame called Wisconsin.sf then draw the boundary.
Wisconsin.sf <- USA_48.sf |>
dplyr::filter(NAME == "Wisconsin")
ggplot(data = Wisconsin.sf) +
geom_sf(fill = "palegreen",
color = "black")
Where is the state of Nebraska? Repeat but fill in Nebraska using the color brown.
Nebraska.sf <- USA_48.sf |>
dplyr::filter(NAME == "Nebraska")
ggplot(data = USA_48.sf) +
geom_sf() +
geom_sf(data = Nebraska.sf,
fill = "brown")
You add layers with the + symbol as before.
Boundaries serve as the background canvas for spatial data analysis. You usually need to add data to this canvas. Depending on the type of data, you either overlay it on top of the boundaries or use it to fill in the areas between the boundaries.
Fills
Choropleth maps (heat maps, thematic maps) map data values from a column in the simple feature data frame to the fill aesthetic. The aesthetic assigns colors to the various map areas (e.g. countries, states, counties, zip codes).
Recall the column labeled AWATER contains the water area in square meters. Since the values are very large first divide by million (10^9) to get the values in 1000s of square kilometers. This is done with the mutate() function.
USA_48.sf <- USA_48.sf |>
dplyr::mutate(WaterArea_km2 = AWATER/10^9)Then create a choropleth map showing the water area by filling the area between the state borders with a color. This is done using the aes() function and the argument fill = WaterArea_km2.
ggplot(data = USA_48.sf) +
geom_sf(aes(fill = WaterArea_km2))
Note how this differs from just drawing the boundaries. In this case you use the aes() function with the fill aesthetic.
The map is not very informative. large water area of Michigan which includes Lakes Michigan, Superior, and Huron has by far the most water area with most other states have a lot less.
To change that use the logarithm of the area. The base 10 logarithm is 0 when the value is 1, 1 when the value is 10, 2 when the value is 100 and so on. This is seen with the log10() function.
log10(c(1, 10, 100, 1000, 10000))## [1] 0 1 2 3 4
You convert the area to logarithms with the log10() function inside the aes() function as follows.
ggplot(data = USA_48.sf) +
geom_sf(aes(fill = log10(WaterArea_km2))) 
Another way to make the map more informative is to convert the continuous variable to a discrete variable and map the discrete values.
In the {dplyr} package the cut_interval() function takes a continuous variable and makes n groups each having an equal range, cut_number() makes n groups with (approximately) equal numbers of observations; cut_width() makes groups of equal width.
As an example, if you want a map with 5 colors with each color representing a state having a similar amount of water area you would use cut_number() and specify n = 5. You do this with the mutate() function to create a new variable (column) called WaterArea_cut.
USA_48.sf <- USA_48.sf |>
dplyr::mutate(WaterArea_cut = cut_number(WaterArea_km2, n = 5))
str(USA_48.sf)## Classes 'sf' and 'data.frame': 48 obs. of 12 variables:
## $ STATEFP : chr "31" "53" "35" "46" ...
## $ STATENS : chr "01779792" "01779804" "00897535" "01785534" ...
## $ AFFGEOID : chr "0400000US31" "0400000US53" "0400000US35" "0400000US46" ...
## $ GEOID : chr "31" "53" "35" "46" ...
## $ STUSPS : chr "NE" "WA" "NM" "SD" ...
## $ NAME : chr "Nebraska" "Washington" "New Mexico" "South Dakota" ...
## $ LSAD : chr "00" "00" "00" "00" ...
## $ ALAND : num 1.99e+11 1.72e+11 3.14e+11 1.96e+11 6.77e+11 ...
## $ AWATER : num 1.37e+09 1.26e+10 7.29e+08 3.38e+09 1.90e+10 ...
## $ geometry :sfc_MULTIPOLYGON of length 48; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:1516, 1:2] -104 -104 -104 -104 -104 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## $ WaterArea_km2: num 1.372 12.559 0.729 3.383 19.006 ...
## $ WaterArea_cut: Factor w/ 5 levels "[0.489,1.38]",..: 1 5 1 3 5 5 2 4 3 3 ...
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr [1:11] "STATEFP" "STATENS" "AFFGEOID" "GEOID" ...
Essentially you added a new factor variable called WaterArea_cut with five levels corresponding to equal number of water area values.
You can go directly to the mapping as follows.
ggplot(data = USA_48.sf) +
geom_sf(aes(fill = WaterArea_cut))
Make a choropleth map displaying the ratio of water area to land area (ALAND) by state.
ggplot(data = USA_48.sf) +
geom_sf(aes(fill = AWATER/ALAND * 100))
Overlays
The USA_48.sf simple feature data frame uses longitude and latitude for it’s coordinate reference system (CRS). All spatial data frames have a CRS.
To see what CRS a simple feature data frame use the sf::st_crs() function.
sf::st_crs(USA_48.sf)## Coordinate Reference System:
## User input: NAD83
## wkt:
## GEOGCRS["NAD83",
## DATUM["North American Datum 1983",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["latitude",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["longitude",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4269]]
The Coordinate Reference System information including the EPSG code (4269) and the corresponding GEOGCS, DATUM, etc is given in well-known text (wkt).
Here it specifies a geographic reference system with longitude and latitude and a datum (North American 1983) that describes the sea-level shape of the planet as an ellipsoid.
Because the CRS uses longitude and latitude you can add locations by specifying the geographic coordinates.
For example, suppose you want to overlay the locations of two cities on the map. First you create a data frame containing the longitudes, latitudes, and names of the locations.
Cities.df <- data.frame(long = c(-84.2809, -87.9735),
lat = c(30.4381,43.0115),
names = c("Tallahassee", "Milwaukee"))
class(Cities.df)## [1] "data.frame"
Next you draw the map as before but add the locations with a point layer and label the locations with a text layer.
ggplot(data = USA_48.sf) +
geom_sf(color = "gray80") +
geom_point(data = Cities.df,
mapping = aes(x = long, y = lat),
size = 2) +
geom_text(data = Cities.df,
mapping = aes(x = long, y = lat, label = names),
nudge_y = 1)
As another example, let’s consider the airports data frame from the {nycflights13} package. The data frame includes information on 1458 airports in the United States including their location with latitude and longitude.
library(nycflights13)
airports## # A tibble: 1,458 × 8
## faa name lat lon alt tz dst tzone
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 04G Lansdowne Airport 41.1 -80.6 1044 -5 A America/…
## 2 06A Moton Field Municipal Airport 32.5 -85.7 264 -6 A America/…
## 3 06C Schaumburg Regional 42.0 -88.1 801 -6 A America/…
## 4 06N Randall Airport 41.4 -74.4 523 -5 A America/…
## 5 09J Jekyll Island Airport 31.1 -81.4 11 -5 A America/…
## 6 0A9 Elizabethton Municipal Airport 36.4 -82.2 1593 -5 A America/…
## 7 0G6 Williams County Airport 41.5 -84.5 730 -5 A America/…
## 8 0G7 Finger Lakes Regional Airport 42.9 -76.8 492 -5 A America/…
## 9 0P2 Shoestring Aviation Airfield 39.8 -76.6 1000 -5 U America/…
## 10 0S9 Jefferson County Intl 48.1 -123. 108 -8 A America/…
## # … with 1,448 more rows
Each row is an airport and the location of the airport is given in the columns lat and lon. You can make a map without boundaries by drawing a scatter plot with x = lon and y = lat.
ggplot(data = airports,
mapping = aes(x = lon, y = lat)) +
geom_point()
If you only want airports within the continental United States, you first plot the USA_48.sf boundaries and then add the airport locations as a separate point layer and the use the coord_sf() function specifying the limits of the plot in the longitude direction (xlim) and the latitude direction (ylim).
ggplot(data = USA_48.sf) +
geom_sf(color = "gray80") +
geom_point(data = airports,
aes(x = lon, y = lat)) +
coord_sf(xlim = c(-130, -60),
ylim = c(20, 50)) +
theme_minimal()
Alternatively, you can use sf::st_as_sf() to convert the airports data frame to a simple features data frame. The argument coords = tells sf::st_as_sf() which columns contain the geographic coordinates of each airport. You also set the CRS using the sf::st_crs() function and use the EPSG code corresponding to a geographic CRS.
airports.sf <- sf::st_as_sf(airports,
coords = c("lon", "lat"),
crs = 4269)
airports.sf## Simple feature collection with 1458 features and 6 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -176.646 ymin: 19.72137 xmax: 174.1136 ymax: 72.27083
## Geodetic CRS: NAD83
## # A tibble: 1,458 × 7
## faa name alt tz dst tzone geometry
## * <chr> <chr> <dbl> <dbl> <chr> <chr> <POINT [°]>
## 1 04G Lansdowne Airport 1044 -5 A Amer… (-80.61958 41.13047)
## 2 06A Moton Field Municipa… 264 -6 A Amer… (-85.68003 32.46057)
## 3 06C Schaumburg Regional 801 -6 A Amer… (-88.10124 41.98934)
## 4 06N Randall Airport 523 -5 A Amer… (-74.39156 41.43191)
## 5 09J Jekyll Island Airport 11 -5 A Amer… (-81.42778 31.07447)
## 6 0A9 Elizabethton Municip… 1593 -5 A Amer… (-82.17342 36.37122)
## 7 0G6 Williams County Airp… 730 -5 A Amer… (-84.50678 41.46731)
## 8 0G7 Finger Lakes Regiona… 492 -5 A Amer… (-76.78123 42.88356)
## 9 0P2 Shoestring Aviation … 1000 -5 U Amer… (-76.64719 39.79482)
## 10 0S9 Jefferson County Intl 108 -8 A Amer… (-122.8106 48.05381)
## # … with 1,448 more rows
To graph the points on the map, you use a second geom_sf().
ggplot() +
geom_sf(data = USA_48.sf) +
geom_sf(data = airports.sf, shape = 1) +
coord_sf(xlim = c(-130, -60),
ylim = c(20, 50))
You can change the size or type of symbols on the map. For instance, you can draw a bubble plot (also known as a proportional symbol map) and encode the altitude of the airport through the size = aesthetic.
ggplot() +
geom_sf(data = USA_48.sf) +
geom_sf(data = airports.sf, aes(size = alt),
fill = "grey", color = "black", alpha = .2) +
coord_sf(xlim = c(-130, -60),
ylim = c(20, 50)) +
scale_size_area(guide = FALSE)## Warning: It is deprecated to specify `guide = FALSE` to remove a guide. Please
## use `guide = "none"` instead.

Circle area is proportional to the airport’s altitude (in feet).
Map projections
Depending on how a curved surface is projected onto a 2-D surface (map), at least some features will be distorted. The coord_sf() function package provides a way to adjust projections.
With a geographic projection the longitudes and latitudes are treated as x (horizontal) and y (vertical) coordinates.
Consider again the boundary map of the lower 48 states. Here we get the boundary file using the us_states() function from the {USAboundaries} package and use the filter() function to remove rows corresponding to Hawaii, Alaska, and Puerto Rico.
USA_48.sf <- USAboundaries::us_states() |>
filter(!state_name %in% c("Hawaii", "Alaska", "Puerto Rico"))Here you first assign the map to an object called base_map and then render the map to the plot device by typing the object name.
base_map <- ggplot(data = USA_48.sf) +
geom_sf()
base_map
Note the equal spacing between the latitudes and between the longitudes. 1 degree latitude distance equals 1 degree longitude distance. This is called a carto-cartesian (geographic) projection.
You change the projection by specifying the CRS. For example to change the base map to have a Mercator projection you use the coord_sf() function with crs = "+proj=merc" (or equivalently crs = 3857, which uses the EPSG code 3857 for world Mercator projection).
base_map +
coord_sf(crs = "+proj=merc") +
ggtitle("Mercator projection")
base_map +
coord_sf(crs = 3857) +
ggtitle("Mercator projection")
Note the distance between the latitudes increases with increasing latitude. Note also the projection is applied to the rendered map and not the simple feature data frame used to create it.
The Mercator projection is widely used, but it makes areas closer to the poles appear larger than the same areas closer to the equator. Greenland appears as large as the continent of Africa. In reality Africa is 14 times larger in area than Greenland.
Other coordinate systems require specification of the standard lines, or lines that define areas of the surface of the map that are tangent to the globe. These include Gall-Peters, Albers equal-area, and Lambert azimuthal.
base_map +
coord_sf(crs = "+proj=cea +lon_0=0 +lat_ts=45") +
ggtitle("Gall-Peters projection")
With this projection states having the same area appear with the same size, but the boundary shapes are distorted.
Distortions are smallest between latitudes defined by the Albers equal-area projection.
base_map +
coord_sf(crs = "+proj=aea +lat_1=25 +lat_2=50 +lon_0=-100") +
ggtitle("Albers equal-area projection")
USA Contiguous Albers Equal Area Conic, USGS (EPSG = 5070 or 102003)
See Kyle Walker’s get CRS See maptiles package https://github.com/riatelab/maptiles/
Why map projections matter. Clip from The West Wing. https://youtu.be/vVX-PrBRtTY
Thursday, September 21, 2022
Today
- Making maps
Maps with tmap
The {tmap} package has functions for creating thematic maps. The syntax is like the syntax of the functions in {ggplot2}. The functions work with a variety of spatial data.
Consider the simple feature data frame called World from the {tmap} package.
library(tmap)
data("World")
str(World)## Classes 'sf' and 'data.frame': 177 obs. of 16 variables:
## $ iso_a3 : Factor w/ 177 levels "AFG","AGO","ALB",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ name : Factor w/ 177 levels "Afghanistan",..: 1 4 2 166 6 7 5 56 8 9 ...
## $ sovereignt : Factor w/ 171 levels "Afghanistan",..: 1 4 2 159 6 7 5 52 8 9 ...
## $ continent : Factor w/ 8 levels "Africa","Antarctica",..: 3 1 4 3 8 3 2 7 6 4 ...
## $ area : Units: [km^2] num 652860 1246700 27400 71252 2736690 ...
## $ pop_est : num 28400000 12799293 3639453 4798491 40913584 ...
## $ pop_est_dens: num 43.5 10.3 132.8 67.3 15 ...
## $ economy : Factor w/ 7 levels "1. Developed region: G7",..: 7 7 6 6 5 6 6 6 2 2 ...
## $ income_grp : Factor w/ 5 levels "1. High income: OECD",..: 5 3 4 2 3 4 2 2 1 1 ...
## $ gdp_cap_est : num 784 8618 5993 38408 14027 ...
## $ life_exp : num 59.7 NA 77.3 NA 75.9 ...
## $ well_being : num 3.8 NA 5.5 NA 6.5 4.3 NA NA 7.2 7.4 ...
## $ footprint : num 0.79 NA 2.21 NA 3.14 2.23 NA NA 9.31 6.06 ...
## $ inequality : num 0.427 NA 0.165 NA 0.164 ...
## $ HPI : num 20.2 NA 36.8 NA 35.2 ...
## $ geometry :sfc_MULTIPOLYGON of length 177; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:69, 1:2] 61.2 62.2 63 63.2 64 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr [1:15] "iso_a3" "name" "sovereignt" "continent" ...
The spatial data frame contains socioeconomic indicators from 177 countries around the world. Each row is one country’s indicators.
You make a map by first specifying the spatial data frame using the tm_shape() function and then you add a layer consistent with the geometry.
For example, if you want a map showing the index of happiness (column name HPI) by country, use the tm_shape() function to identify the spatial data frame World then add a fill layer with the tm_polygons() function.
The fill is specified by the argument col = indicating the specific column from the data frame. Here use HPI.
tm_shape(shp = World) +
tm_polygons(col = "HPI")
The tm_polygons() function with the argument col = colors the countries based on the values in the column HPI of the World data frame.
Map layers are added with the + operator.
Caution: the column in the data frame World must be specified using quotes "HPI". This is different from the functions in the {ggplot2} package.
To show two thematic maps together each with a different variable, specify col = c("HPI", "well_being")
The tm_polygons() function splits the values in the specified column into meaningful groups (here 8) and countries with missing values (NA) values are colored gray.
More (or fewer) intervals can be specified with the n = argument, but the cutoff values are chosen at appropriate places.
Tornado data
Consider the tornado data from the U.S. Storm Prediction Center (SPC). It is downloaded as a shapefile in the directory data/1950-2018-torn-aspath.
A shapefile is imported with the sf::st_read() function from the {sf} package.
Tornadoes.sf <- sf::st_read(dsn = "data/1950-2018-torn-aspath")## Reading layer `1950-2018-torn-aspath' from data source
## `/Users/jameselsner/Desktop/ClassNotes/QG-2022/data/1950-2018-torn-aspath'
## using driver `ESRI Shapefile'
## Simple feature collection with 63645 features and 22 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -163.53 ymin: 18.13 xmax: -64.9 ymax: 61.02
## Geodetic CRS: WGS 84
The assigned file is a simple feature data frame with 63645 features (observations) and 23 fields (variables).
Each row (observation) is a unique tornado.
Look inside the simple feature data frame with the glimpse() function from the {dplyr} package.
dplyr::glimpse(Tornadoes.sf)## Rows: 63,645
## Columns: 23
## $ om <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18…
## $ yr <dbl> 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1950, 1…
## $ mo <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ dy <dbl> 3, 3, 3, 13, 25, 25, 26, 11, 11, 11, 11, 12, 12, 12, 12, 12, …
## $ date <chr> "1950-01-03", "1950-01-03", "1950-01-03", "1950-01-13", "1950…
## $ time <chr> "11:00:00", "11:55:00", "16:00:00", "05:25:00", "19:30:00", "…
## $ tz <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ st <chr> "MO", "IL", "OH", "AR", "MO", "IL", "TX", "TX", "TX", "TX", "…
## $ stf <dbl> 29, 17, 39, 5, 29, 17, 48, 48, 48, 48, 48, 48, 48, 48, 48, 28…
## $ stn <dbl> 1, 2, 1, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 10, 2, 1, …
## $ mag <dbl> 3, 3, 1, 3, 2, 2, 2, 2, 3, 2, 2, 2, 1, 2, 1, 2, 1, 3, 2, 4, 2…
## $ inj <dbl> 3, 3, 1, 1, 5, 0, 2, 0, 12, 5, 6, 8, 0, 0, 32, 2, 0, 15, 0, 7…
## $ fat <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 3, 0, 3, 0, 18, …
## $ loss <dbl> 6, 5, 4, 3, 5, 5, 0, 4, 4, 5, 5, 4, 4, 4, 5, 4, 0, 5, 3, 5, 5…
## $ closs <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ slat <dbl> 38.77, 39.10, 40.88, 34.40, 37.60, 41.17, 26.88, 29.42, 29.67…
## $ slon <dbl> -90.22, -89.30, -84.58, -94.37, -90.68, -87.33, -98.12, -95.2…
## $ elat <dbl> 38.8300, 39.1200, 40.8801, 34.4001, 37.6300, 41.1701, 26.8800…
## $ elon <dbl> -90.0300, -89.2300, -84.5799, -94.3699, -90.6500, -87.3299, -…
## $ len <dbl> 9.5, 3.6, 0.1, 0.6, 2.3, 0.1, 4.7, 9.9, 12.0, 4.6, 4.5, 8.0, …
## $ wid <dbl> 150, 130, 10, 17, 300, 100, 133, 400, 1000, 100, 67, 833, 233…
## $ fc <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ geometry <LINESTRING [°]> LINESTRING (-90.22 38.77, -..., LINESTRING (-89.3 …
The first 22 columns are variables (attributes). The last column contains the geometry. Information in the geometry column is in well-known text (WKT) format.
Each tornado is a coded as a LINESTRING with a start and end location. This is where the tm_shape() function looks for the geographic information.
Here you make a map showing the tracks of all the tornadoes since 2011. First filter the data frame keeping only tornadoes occurring after the year (yr) 2010.
TornadoesSince2011.sf <-
Tornadoes.sf |>
dplyr::filter(yr >= 2011) Next get a file containing the boundaries of the lower 48 states.
USA_48.sf <- USAboundaries::us_states() |>
dplyr::filter(!state_name %in% c("Hawaii", "Alaska", "Puerto Rico"))Then use the tm_shape() function together with the tm_borders() layer to draw the boundaries before adding the tornadoes. The tornadoes are in a separate spatial data frame so you use the tm_shape() function together with the tm_lines() layer.
tm_shape(shp = USA_48.sf, projection = 5070) +
tm_borders() +
tm_shape(shp = TornadoesSince2011.sf) +
tm_lines(col = "red")
The objects named TornadoesSince2011.sf and USA_48.sf are simple feature data frames. You map variables in the data frames as layers with successive calls to the tm_shape() function.
The default projection is geographic (latitude-longitude) which is changed using the projection = argument and specifying a EPSG number (or proj4 string). Here you use 5070 corresponding to USA Contiguous Albers Equal Area Conic, USGS (EPSG = 5070 or 102003).
You make the map interactive by first turning on the "view" mode with the tmap_mode() function before running the code.
tmap_mode("view")## tmap mode set to interactive viewing
tm_shape(USA_48.sf) +
tm_borders() +
tm_shape(TornadoesSince2011.sf) +
tm_lines(col = "red")